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method generates high-throughput sequence-specific analysis of multiple RNAs or. their corresponding cDNAs (gene transcript imaging 
analysis). Another embodiment of the method produces a gene transcript imaging analysis by the use of high-tfiroughput cDNA sequence 
analysis. In addition, the gene transcript imaging can be used to detect or diagnose a particular biological state, diseasey or condition 
which is correlated to the relative abundance of gene transcripts in a given cell or population of cells. The invention provides a method 
for comparing the gene transcript image analysis from two or more different biological specimens in order to distinguish between the two 
specimens and identify one or more genes which are differentially expressed between the two specimens. . 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to Identify, States party 
applications under the PCT. " 



international 



AT 


Austria 


GB 


AU 


Australia • " • • 


GE 


BB 


Barbados 


GN 


BE 


Belgium 0 - ■ 


GR 


BP 


Burkina Faso 


HU 


BG 


Bulgaria i \ : 


IE 


BJ 


Benin 


IT 


BR 


Brazil " i,; t . v 


JP 


BY 


Belarus 


KE 


CA 


Canada 


KG 


CF 


Central African Republic 


KP 


CG 


Congo : * , 




CH 


Switzerland 


KR 


a 


Cflte dlvoire 


KZ 


CM 


Cameroon 


U 


CN 


China 


LK 


CS 


Czechoslovakia 


LU 


CZ 


Czech Republic 


LV 


DE 


Germany 


MC 


DK 


Denmark 


MD 


ES 


Spain 


MG 


FI 


Finland 


ML 


FR 


France 


MN 


GA ■ 


Gabon 





United Kingdom 
:„ Georgia. 
Guinea 

Greece _>"-.■ 
Hungary 
Ireland . 
Italy 
•Japan 
Kenya 
Kyrgystan 

Democratic People's Republic 
of Korea 

Republic of Korea 

Kazakhstan 

Tirrhlrrntrin 

Sri Lanka 

Luxembourg 

Latvia 

Monaco 

Republic of Moldova 

Madagascar 

Mali 

Mongolia 



MR 


Mauritania 


MW 


Malawi 


NE 


Niger 


NL 


Netherlands 


NO 


Norway 


NZ 


New Zealand 


PL 


Poland 


FT 


Portugal 


RO 


Romania 


RU 


Russian Federation 


SD 


Sudan 


SE 


Sweden 


SI 


Slovenia 


SK 


Slovakia 


SN 


Senegal 


TD 


Chad 


TG 


Togo 


TJ 


Tapkistan 


TT 


Trinidad and Tobago 


UA 


Ukraine 


US 


United States of America 


uz 


Uzbekistan 


VN 


Viet Nam 



WO 95/20681' 



PCT/DS95/01160 



COMPARATIVE GENE TRANSCRIPT ANALYSIS K r - t ■ , f , . 

"n -'J - ! .vi.'^u rl. FIELD OF INVENTION ; c -- r -va'; 
^ The present invention is jin the field: of . molecular 
k biology and * computer K science; more -. particularly , the* . 
5 present invenfcior* describes methods of analyzing gene , . 
transcripts and diagnosing the genetic-expression of cells 
andtissue.r - , ,. r . . ; . r :r . h . r 

; y V r t : 2. ^ BACKGROUND ;■ OF , THE INVENTION ^ L • t , 
^ Until very recently, the history- of .molecular biology 
10 has been written one gene at a time. , Scientists have 

observed the cell's physical changes* isolated mixtures, 
from the cell or its milieu, purified proteins,, sequenced 
proteins -and therefrom constructed ; probes to. look for the 
. corresponding gene. t > , 

15 Recently, different nations have set up massive 

projects to sequence .the billions of bases in the human 
genome. These projects typically begin with .dividing the 
genome into large portions >of chromosomes and .then 
determining the isequences of these pieces, ■ .which are then 

20 analyzed for identity with known proteins f or portions 

thereof, known as motifs. Unfortunately, > the majority of 
genomic DNA; does not encode proteins and though it is 
postulated to have ? some effect, on the cell's ability to 
make protein/ its relevance to medical applications is not 

25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 

30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers. In 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 

35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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re3earch, ,„e E ee each individual .gene product as a -pixel" 
^information., which relates to, the expression of- that 
riv"f ??■ """" We" teach herein, methods whereby the 

5 cot '" MXrt * ! - 0f 9 *" e »«-*«- l^ntfa. can be 

* into,, single gene. transcript -image. ,, ln whieh 

each.or, ^e.i^yidual, genes ca „ ^ vlBUal J^ 
simultaneously and .allowing, relationships, between, the„gene 
Pixels. .%o, be, ewily, visuaiijed.-andiunderstood- ; 
o« . -.We further teach a new. method, which , we -call electronic 
" «ubtractio„,,„ E lectronic subtraction will enable SL^T 
researcher to, turn. a sin,le J image,into. a „,ovi„g picture 

one.wh.ch describes,the ;; te B porality.or^.„icr«^r', - 
expression, at the; level ..of -. a , C ell,or a.whole .tissUe xt 

U sca^cf ^ "" 0ti0n " ° £ CSUUlar MOhlne ^ « ^e 

scale of a cell or organ which constitutes the new 
,: inventxon herein. This^constitutes,avnew., v iew into the 

or™!" ,° £ liVin ' ° el1 *»*««W **ich, holds great 

dZnost ° UnWeil aM er ^.therapeutic.and, , 

diagnostic approaches . in ; medicine. , ;i ,.. 

20 We. teach another method which wo call "electronic 

, northern." which tracHs the .expression of a singu ge„ e 
across many types, of .cells and tissues V*™- 
v,.... Nucleic acids (DNA and RNA). carry within their. ■ '" 
sequence the hereditary information and ..are therefore the 
» Prim, molecules , of . life. Nucleic .acids are fou" TZ aS 

,0 r" S ;. tiSSUeS — . over time under various 

condition., treatments and regimes 

genes. The differences among different types of cells'" 

rr i " etlKt ^ «P-sicn o^e 

100 000 or so genes. Fundamental questions of biolc™ 
could be answered by understanding which genes are 
transcribed and knowing tha relatlve abundance 
transcripts in different cells. 
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. Previously,* the < art has only provided for the analysis 
of a few known genes at avtimetby standardomolecirlarc 
biology techniques such as PCR/> northern blot analysis ^ :or^r 
other types of* DNAc probe analysis such as situ , „- ;.-. f ,-- 
5 > hybridization ^ * Each; of these* methods allows one :to* analyze 
the, transcription- of only known genes and/or ; small numbers 
of '} genes: at a. time . , Nucl »; - Acids Res > 19 , 7097-7104 (1991) ; 
Nuclei. .Acids-, Res.; 18,-4833-42 (1990) ; -Nucl. Acids. Res. ,i8,~; 
2789-92 (1989) 7 European J : . Neuroseience^, 1063-1073 

10* (1990) ; Analytical Biochem., 182,, 364-73. U(1990) ; Genet. ^ ,s 
Annals Techn., Appl. 2, 64-70 (1990),; GATA £(4)v 129-33,- i 
(1991), ; Proc. Natl v Acad, j Sci. ;, USA £5:, 1696rl700, (19B8) ; 
Nucl..: Acids Res. ,il f 1954 , (1991) ; ProcvJ Natl>nAcad. Sci. : 
USA &B 9 ,1943-47 (1991) ; Nucl,. Acids Res. \ 19 €123-27 

15 (1991) ; ^roc/. J.NatL. Acad. Sci.; USAtJB£> 5738-42 <(1988)7, * 
Nucl. Acids Res., 1£, 10937. (1988.).. ... : 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation ,x differentiation*,. aging> viral 

20 transformation, morphogenesis,: and mitosis have been : 

pursued for many years, using a variety of methodologies. 
One: of the earliest methods was to isolate and analyze 
levels of the ; proteins in a cell, tissue, morgan system, or 
even organisms ; both before and after the process of 

25 interests One method of analyzing multiple proteins in a : 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At*present, 2-dimensional analysis only resolves 

30 approximately 15% of the proteins. In order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 

35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus, further complicating the sequencing 
process. 
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; Analyzing .differentiation iat ithevigene transcription • 
s level .has.pvercome many ,of. theses disadvantages ando, c 
drawbacks,, since the, power, of; recombinant; IDNA technology* 
allows . amplification. Qf , .signals; -containing very small 
5 ; amounts : of material:. , The most common method, called 
"hybridization,, subtraction^, involves > 'isolation --of . mRNA* 
;from :: the- biological .specimen- Jaef pre .(B) and -after '(A) the 
developmental process of, interest, transcribing . one set: of 
mRNA into i; cDNA, ..subtracting specimen. B;< from, specimen A 
10 , (mRNA from.,cDNA) by, hybridization^ and,constructing,a , C DNA 
.library from .the r nonThybridizing_ mRNA f Tactions many « 
different, groups have; used thisbsi:rategy,successfully^and; 
a variety of procedures have, been published and improved 
upon using this same basic scheme, .Nuclr; .Acids, Res 19 
.15 ,7097-7104, (1991J ; r ,Nucl..<: Acids Res. vlftv . 4833-420(1990).^ ' 
• Nucl. Acids. Res. . M , 2789-92 :(1989.) ; . European J, 
Neuroscience Z . 1063-1073 (1990, .; Analytical Biochem. ^< 
364-73, (1990); Genet. Annals Techm Appl.2, 64-70 (1990)- 
GATA 8 (4) , 129-33 (1991); Proc. Natl. Acad; Sci. USA- M - ' 
20 1696-1700 (1988); Nucl.. Acids Res, ii, ,1954 .(1991, ; rProc.c, 
Natl. Acad, sci. USA : 88, . 1943-47 (1991) ; Nucl. Acids Res. 
I£, 6123-27 (1991) ; ProcNatl. Acad. .Sci. USA 8J>, } 5738-42 
(1988); Nucl., Acids Res. 16, 10937 (1988). 

Although each of these techniques have particular 
strengths .and weaknesses, there are still some limitations 
and undesirable aspects, of these methods: First, the time 
and. effort required. to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
to require 3 to 6 months, depending on the level of skill 
experience, and luck. Second, the resulting subtraction ' 
libraries are typically inferior to the libraries 
constructed by standard methodology, a typical 
conventional cDNA library should have a clone complexity of 
at least 10 6 . clones, and an average insert size of 1-3 kB 
In contrast, subtracted libraries can have complexities of 
10 2 orio 3 and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture , only the genes. • 
induced in specimens A relative to. specimen, B, ^ not -if < fi. i. 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest i :(C) ; e Fourth; /this approach i requires 
5 very large amounts (hundreds of /micrograms) of "driver" 
^ mRNA ( specimen !:B), which significantly limits the number;,.?; 
and typelof ^subtractions: that are possible since many 
tissuesaand cells lare very difficult -.^to* obtain in large 
quantities.^ » ;/ r • o' .^o;.;--,>.< 

10 ^ ,r Fifth; the resolution of the subtraction ds dependent 
■ upon the physical properties of DNA: DNA or RNA: DNA • * - c 
hybridization. L The ability of a given sequence to: find a 
hybridization match ds dependent on it&r unique ^oT value. 
The CoT value is a function; of the number of copies 

15 (concentration) of the particular sequence, multiplied by 
' the time of hybridization. It follows that for sequences 
which are abundant, c hybridization events will; occur very 
rapidly (low CoT value) while rare sequences will f orm 
duplexes at very, high CoT values-. CoT values which allow 

20 such rare sequences to. form; duplexes^ and therefore be 
< effectively selected:, are difficult to achieve in a 

convenient tdme frame. Therefore, hybridization ; u 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this 

25 problem is- further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 

30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 

35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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i,t creates.,multiple unique DNA - fragments "from individual 
^A^and^ay^hus ^ew^the^alysis-of the.number of . 
part^ular,BRNAs,per,l i brary . > They sequenced, randomly 
selected.members from a 3'-directed cDNA library and 
5 established .the frequency ,of appearance of., the various^ 
ESTs. .They, proposed comparing .lists of ^ESTs i from • various 
cell types to .classify .genes,; ..Genes expressed, in -many .. . c 
dxfferent.cell.types .were labeled- .housekeepers and- those 
selec^vely,.expressed in certain cells were labeled cell- 
specie genes, even in the absence of the full sequence of 
the gene or, the ^logical; a C *^^ 

,The. .present .invention, avoids; -.the, drawbacks ^of the 
Prior art, by providing a.. method to .quantify the relative 
abundance lD f ^multiple f gene, transcripts in a given 
biological .specimen , by the use of, .high.throughput , ■ , 
sequence-specific .analysis ,of, individual. RNAs.,and/or their 
corresponding r cDNAs . , ; , . .., _ ; ' neir 

The. present invention, offers several advantages .over 
current protein discovery ..methods which attempt to isolate 

:^Ttr teins based ^ bioiogicai -«-t." z 

»ethod of .the.anstant invention provide, for detailed- ■ 
dxagnostxc, comparisons of cell, profiles revealing numerous 
changes, xn. the expression of individual transcripts . 

.Theanstant invention provides several advantages over 
current subtraction methods including a more comple! 
, , ixbrary analysis : (l(* : to_ 10 > clones as compared to io' 
clones,,, which. allows identification of low abundance 
whiTeTtT W611 38 enabling ide ^-ation of messages 
large lxbrar.es are very routine to make in contrast to the 
eaSy'be d *» homologues « 

This method is very convenient because it organizes a 
^quantity of data into a comprehensible, digestible 
format The most significant differences are highlighted 

subtraction - in depth — - — 
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fthe present invention ^provides several .advantages over 
previous methods of electronic ^analysis' of cDNA. \.The 
method is particularly powerful when more >than 1 100 i-and 
preferably more than 1>000 gene transcripts are analyzed. 
5 In such a case, new low^frequency transcripts are 1 t C *.r o ^ 
discovered ^and tissue typed.; ; r -j ^; iw*-- :-; rc 

High resolution analysis of gene expression can be . . 
used directly as a diagnostic < prof ile or* to identify 
disease-specific genes fori the development of more; classic 
10 diagnostic approaches. r r , ; . ■ v 

< ? This process * is . defined as gene transcript frequency 
analysis- The resulting* quantitative analysis uo£ the gene 
transcripts *ls ^defined -as comparative gene transcript- 
analysis.* * * ■ i . » ; — ...... v.;.,,^"^.,, 

15 , 3. : ^ SUMMARY OF THE INVENTION 

The invention >is.< a method "of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 

; a set of transcript sequences, where each of the transcript 

20 sequences in: said set is indicative of .a different one of 
the vbiological sequences of the library; (c) processing the 
transcript sequences in a programmed computer {in which a 
database of reference transcript sequences indicative, of 
reference sequences is stored) , to generate an identified 

25 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 

30 identified sequence value to generate final data values. 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 

35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set is indicative , of - one of w • 

of,the : .second. i- ibrary . . T ^n 1L 1 ^f^"* 1 ^nces 

1 ^evsecond^.set-oftranscrinf 
sequences ,x» processed in,a programmed.. con ,puter ZZZL 

5 fu T . S6t ° f, - identified Se ^ e — valuer .namely Z 
^;^ ifi ^ -^„ce .values^each ^which^ 

o^tcTb f**-~ — ^^o-nd-includes a degree 
of .match .between -one .of ,.the .biological, fieq uences of thT 
second, .library «nd at .least, *ne,of the re^™ !f 

The further .identified sequence value! f 
10 generate further ^ , e( ^ ence values are processed to 
generate further. >final.:. data values^ indicate ^ 
Pf .times .each further id»««H*- . W^atave^of -the- number. 

^wi lurxner identif ied; iseouence vaii,«. i-. 
in the second, library. .^ he «„,., ^ npe value : present c 
fir^ • final idata .values from the-, 

* ° f fences, which ^^"^^ 

- = further ■>m K ^^.^ mm ^ iM] ^_" mm ''^ 

which .the mR»A ^ .transcribed bv ! »«■•«•&<» 

cc,..aeter» inlng .; r : b :::rr izzrzz?, 

«». which cDHA copies ^LT ' ** 

inserted into a suitable vectcr which is u sea ! " " ' 

0 ™rr strain — « .r~ 

permitted to grow into cloneq <=> a ^ 

unio.e ^ a represent 1 :^ C^irr^ ' 

5 Which identifier ^^ence-specific method , . 

to a clone is determined to e^te Ln. t' " 
~.ce. T he cenes and th.iAtL^ ste" u tld in 
order of abundance to produce a g ene transcript ^ " 
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■ In a further embodiment, the; relative- abundance of the 
gene transcripts tin one - cell type or tissue^ is compared • ■ 
with the relative abundance of gene \ transcript numbers .in a 
second cell type or tissue i in order to ; identify, the \; 
5^ differences and' similarities. 1 ^ , a."-" ~ u ; l*.,i*u*: : c, 
u In a further embodiment, the method includes ra ? system 
for analyzing a library tof biological sequences including a 
means for receiving a set of transcript sequences, where 
each of the .transcript sequences is indicative of a. 

10 different one of the biological sequences ,ofc the library; 
and a means for processing the transcript sequences in a 
computer system in which; a database .of reference transcript 
sequences indicative. of reference sequences, .is stored, 
wherein the computer lis programmed with software for -.. ; -u. s 

15 generating an identified>sequence value if or .each of the. 1 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation rand the degree 
of match between a different one of the biological 
sequences of the library and at least one. of L the reference 

20 sequences, and for processing each said identified < sequence 
value to generate final data values indicative of the - . 
number of times each identified sequence value is present 
in the library . 1 ^ • ; 

In essence, the invention is a method and system for 

25 quantifying the relative abundance of gene transcripts in a 
biological specimen; The invention provides a method for 
comparing the gene * transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 

30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 

35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 
or more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease, 
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of IT i "°'" WhiCh ' iS *°™»«.d to the relative abundance 
-»£ ,gene .transcripts ,i„ , a ,iven,.cell,.or p 0 pulation,of.,«eHs. 



4.sl.j- SABLES i 



5 asaaejl present, ,. detailed explanation of ' the letter 

-codes . utili 2 ed,in Tables , 2-5,. , ...„ , ... , 

■ isbjejz lists .the , one hundred, ; »ost » ^ g m ..? . " ' 
.transer^ts^.jtv.is ,a partial .list ,ofc isolates,* rom the 
. , ««EC cDKA library prepared, and sequenced; as described J 

« abn f -« ; *. «»^»« ! . order 
xof.abundance.^.this. table. . The oext.column^beled , 

-identification .reference- matching, «he sequence in .the- ■ < 
^ . entry" column number. Isolates that have not been 
15 sequenced ; ar..not present ,i„ , Table .2, „ The, ne*t .column 

Ube ed.™,,, indicates,.,,, total number of, 0^, ^ have 
the same degree of Batch ,-vith ,the .sequence . of, the .reference 
transcript , in -the . "entry.. , column. - reference 

20 name ^ the NI » <™K .locus 

Z r? COrreSPOndS " '^ence numbers.,;.. 

The s- .column indicates ;i„ , cases ;the species of the 
reference sequence The * une 

quence. The code for column «s" is given in 

L\ L 1 ' t coiumn ubeiea p-iae's : P : in 

- English explanation of the. identity .of the sequence 

,=o^ POndi,>9 t0 ^ " IH GEHM " K 10CUS W in -entry- 
, h ,,^ Ia f a - 1 1S " ct »»P^ison of the top fifteen most 

™ 9 niir nsoripts in nOTai — - — 

30 yaj^ is a detailed summary of library subtraction 

analysis summary comparing the THP-l and human macr^a,. 

cDNA sequences, m Table 4 the s *mo ^ • 

*, tne same code as in Tabl*» •> -i- 

number in the subtracted library, , -rfend- (abundance 
35 number in the target library, and -ratio- (the target 
abidance number divided by th. subtracted abundance 
number, . As is clear from perusal of the table, when th. 
abundance number in th. subtractant Ubrary is »0»" 
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target abundance number is divided by 0Y05. This is a* way 
of bbtaining a result (hot possible dividing toy 0)^ and-' V , 
distinguishing' the result from ratios* of isubtractaht : i j 
numbers 6'f 1'.*— 1 :)i - ,t 1 3 1 ■ •* ' • •■ >•* o y-*.rv> -■ ir sea. 
5* Table 5 Is the computer program, written^ in source 

code/j for generating gene transcript subtraction profiles. 

( Table 6 is^ a- partial < listing of database .entrites used 
in the' electronic northern' blot analysis as^ provided by ! the 
present- invention.-- « 1; < ^ • - \ <• ---i'-v 

0 ■ 4.2. BRIEF DESCRIPTION OF THE DRAWINGS 

Ficrure l is a~ chart summarizing data collected and 
stored regarding the' library construction "portion of ^ 
sequence preparation and analysis/ 

15 - Figure 2 v is a diagram representing the sequence of 
operations performed by "abundance sort" software in a 
class of preferred embodiments of >n the inventive methods 

Figure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 

20 Figure 4 is a more detailed block diagram of the 

bioinf ormatics process firom new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions. 

25 5. DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 

35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a , profile .of ...specimen consisting of „ single oell 
Xor. clones , 9 f »:,singa« eel!) . ,or of : »any ,eell,, , ori ,or . , 
:.tissu. .more, complex than a single cell and ^contatoing ." ' 

.Multiple cell, types, such.as liver. ,„.. ." 

■rS The invention has, significant, advantages. In' "the 'fields 

-^diagnostics, toxicology, and, pharmacology., tc name. , f.„. 
.^hxghly, sophisticated ..diagnostictest can .be .performed on 
• the^u patient lin whom ^.diagnosis -ha,.. not been Bade. -A 
biological -specimen -.consisting ..of ■, the .patient's fluids or 

'anTe^. 1 : ^ 1 " 6 "' ^ ^ 6 ^ i-lated 
iden^v 1 T °etermine,*heir 
identity., Optionally,, the gene transcripts can. be . , . 

sTectl f CDK4 ' ! ^-transcripts. are' 

ejected to ^ence.sp.cific analysis- and quantified, . 
,15 These ;gene transcript seguence.. abundances „re compared ■' \ 

normal dT 8renCe ******* *«»»e>°° abundances including. 
normaVdata^ets fordiseased .and .healthy patients. , . ThI.. . 

patient has. the .diseases . „ith : which the ^atient.s^data :: 
set, most closely correlates. 

" used t P o r d«r le :. 9ene ^"-^freguency analysis 

-«lls or ti 6 " ml CSUS ~ «««»- *- diseased 

nor»!l ' 3USt aS 14 <"«ereno.s between 

normal monocytes and activated macrophages i„ Table 3 

in toxicology, a fundamental guestion is which tests 

effecT r SCMVe " PrediCti " 9 - — tin, a toxir 
inf^V- tranSCript i-^ing provides highly detailed 
„h fT 7» °" ^ 0611 " SSUe - v -°nment, some of 
Which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 

30 powerful method to predict drug toxicity Z ef LalT ' 
Similar benefits accrue in the use of this tool i„ 
Pharmacology. The gene transcript image can be used, 
selectively to look ,t protein categories which are 
expected to be affected, for example, en 2 ymes which 

35 detoxify toxins. 

in an alternative embodiment, comparative gene - 
transcript freguency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-Jncer 
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agents are tamoxifen, vincristine; v vinblastine, - - 
podophyllotoxins , etoposide> tenisposidey ci^platin, rO 
biologic response modifiers such >as r interf erdri; 1 11-2,' GM- 
CSF , enzymes , hormones arid^ the likev ( ; This" method alsb h 
5 provides a means for 5 sorting this gene transcripts by ! ; - l 
functional category.- * In the base of cancer cells; 1 1 ^ ^ 
transcription factors or bthiEfr essential regulatory H 
molecules are very important categories to analyze across 
different libraries. 

10 In yet another r ^bodiment^compard^ive:jgene transcript 

frequency:* analysis is used to- differentiate between control 
liver cells and 11 verc cells isolated from patients ^treated 
with experimental drugs* like FIAU to distinguish between 
pathology caused by- the underlying disease and that caused 

15 by the drug ^ ?c w — .. .i t - . (firj . . . jL .r-.\. .i . 

In yet another embodiment, comparative 'gene transcript 
frequency analysis is used to differentiate 'between brain 
tissue from pat ie'nt-s 1 treated arid untreated with lithium. 
In a further embodiment, comparative gene transcript 

20 frequency analysis is used to differentiate between ! 

cyclosporin and FK5 06 -treated cells and normal cells, r 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells^and 

25 uninfected human cells- Gene transcript frequency analysis 
is also 7 used to rapidly survey gene transcripts in HIV- < 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the < 
success of treatment and/or new avenues to study. 

30 In a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 

35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues. Such comparisons could 
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-identify deletion mutants which do' not produce a. gene 
-product and point mutants which -produce ,a. less., abundant or 
•otherwise different message;. Such, mutations, can.^af feet 
-^asic : biochemical, and, -pharmacological processes, such as 
5 mineral -nutrition . and- metabolism^ , and^can be, isolated by 
means known to those skilled .in. the .art. ,< Thus, crops iwith 
.•improved- yields, pest resistance and, other factors • can . be 
s •developed..; M cn.ch v^-c , 

in a,,further enfcodiment-comparative^gene transcript 
L0 freguency analysis is , used, for an -interspecies ^comparative 
-analysis which would allow-, for, th^ selection-of better* 
Pharmacologic animal models. i:.cInhthi S!! embodiment v ,. humans 
and other animals: )(suchi as >a;mouse).;oor,theirrcultured ■■ 
CellS are tr ^ted. w ith a specific.testpagent. The relative 
sequence abundance of, each cDNA population! is determined, 
• -If the animal test system -isaa, good .model, homologous genes 
an the animal- cDNA,po P ulatioh should change .expressions 
similarly to those in human cells. If side effects are 
detected with the, drug, a detailed transcript .abundance 
analysis, will be performed, to survey gene transcript- 
changes. . Models will then be -evaluated, by .comparing, basic 
physiological changes.. .,. . 

7 In a further embodiment, comparative ] gene transcript 
frequency analysis is used in a clinical .setting to give a 
highly detailed: gene transcript profile of .. a patient's 
cells or tissue (for example, a blood sample) . m 
particular, gene transcript frequency analysis is used' to 
give a high resolution gene expression profile of a 
diseased. state or condition. 

in the preferred embodiment, the method utilizes 
high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
ammo acid sequences are then extensively compared with 
GENBANK- and other sequence data banks as described below 
The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression i of individual .transcripts. ; After it is 
determined if the sequence is an , Jexact , < match, similar or 
a non-match, the T -sequiBnce is ^entered into a database I *>uL*- 
Next , the numbers 1 of copies: of cDNA corresponding r to each 
5 gene are 'tabulated. Although this can ^ be done -slowly and 
arduously^ if at< all, < by human hand from a printout of all 
entries, a computer ^program is < a useful and rapid way to 
tabulate >this 'information. : The ^numbers of cDN A copies « s.o 
(optionally divided by the total number .of sequences ^in : the 
10 data set) provides a :picture of the relative abundance: of 
transcripts for> each corresponding gene. i The list of 
represented genes can then be : sorted by abundance in< the : o 
cDNA population.' A multitude of additional types of 
comparisons or ; dimensions are possible and are exemplified 

^ i An alternate method of producing a gene transcript 
image ^includes the steps of obtaining ra mixture of ' test : ^ 
mRNA^and providing a representative array of unique probes 
whose -sequences are r complementary t to <at least some of the 

20 test mRNAsi Next, a fixed amount ;.of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficients time to -allow hybrids of the test 
mRNA and probes to form. The mRNA-probe , hybrids are 
detected and the ^quantity determined . The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES , 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

35 6.1. TISSUE SOURCES AND CELL LINES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. ?>Most . popular are -tissues obtained from the human 
sbpdy . ... Tissues,, can be obtained -from .any „oz;gan, .of;, -the body, 
^any iage .donor* '-any. abnormality or any. immortalized .wcell 
< line;.. : Immortal , cell lines,. may> be : .pref erred .;in . some n > 
r5 , instances, .because, : of '■■ their )purity),of , celli type-;, other : 
:tissue ; samples , invariably... include, .mixed cell types.; , A, ; 
.special, technique is available to take a single cell (for 
example , r a, brain, cell) . and .harness, the cellular, machinery 
itp tgrow ,up sufficient vcDNA.. for- .sequencing , ; by *he techniques 

10 and -analysis described -herein U.;S. latent. ;Nos... 

5, 021, 3 3 5>; and >5,168 ,<038J, ..which are incorporated by.,, i c & u ' 
ref erence) The ..examples * given herein. utili-zed^the.ir j a.i 
following immortalized ;cell lines:, monocyte-aike ; U^937 
cells , activated , macr ophage-l-i-kei .THP- 1 . cells,; ^ induced!, , , , H , , 

15 vascular endothelial cells] ;(HUyEC cells) and «mast cell-like 
HMC-l cells.;.; ... , ....... ......... , .. 

> , .The ,y-9,37 cell line is .a human histioeytic lymphoma 
cell line with monocyte characteristics,, established. from' . 
malignant: cells obtained. from the pleural ^ effusion, of a , 
patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K.- (1976) : Int; J... Cancer 17:565) U-937, is 
^one of only, a few ( human cell lines with the morphology,, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells.,,,, These cells can be 
induced to; terminal . monocytic differentiation and will 
express; new cell surface molecules when activated with 
supernatants .from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and . functional, changes, including 
augmentation of antibody-dependent cellular cytotoxicity 
(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages) . Activation of u-937 
cells with phorbol 12-myristate 13-acetate (PMA) in vitro 
stimulates the. production of several compounds, including 
prostaglandins, leukotrienes and platelet-activating factor 
(PAF), which are potent inflammatory mediators. Thus, U- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell lind ; is^ abnormal; 1 * homogeneous, well 
characterized, 'early passage ^endothelial cell ctilture <from :; 
Human umbilical vein (Cell Systems Corp i 12815 ^NE i 124th 
Street, Kirkland/'WA 98034) . i : Only * gene transcripts from 
5 induced; 1 or 1 treated? HUVEC : cells were sequenced. ; One batch 
of il X10 8 cells was 1 treated f or 5^ hours with 1 U/nrtr rIL-lb 
and TOO ng/mr B.coii lipdpoly saccharide 1 (LPS) endotoxin 
£rior' to harvesting: A separate batch of 2 X 10 8 cells -was 
treated at conf luence r with 4 U/ml TNF an<3 2 U/ml 

id ihterferon-gamma (IFN-gamma) prior to- harvesting. <••<- 
•r THP-1 is a* human leukemic cell line with distinct 
monocytic characteristics. This cell line; was derived from 
th£ 'blood' of a l~yedr-dld boy with 1 acute monocytic leukemia 
(Tsuchiya; S. ett f atl. (1980) Inti ■ J. Cancer: ■ 171-76) The > 

15 following cytdlbgical and -cytochemical -criteria i were used 
to determine the 'monocytic nature of the cell line: 1) the 
presence of -alpha-naphthyl butyrate 1 esterase: activity which 
could be ^inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 

20 sensitized SRBC ( sheep red blood cells) ; and 4) the ability 
of mitomycin ^C-treated THP-r cells to activate T- 
lymphocytes following ConA- (concanavalin A) treatment. 1 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 

25 shaped with deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-1 
cells treated with the tumor promoter 12-o-tetradecahoyl- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 

30 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-l cells 
also exhibit an increased adherence to tissue culture 

35 plastic. 

HMC-1 cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 
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mast cells-, contained histamine, and stained' positively for 
chloroacetate* esterase , >.. amino caproate esterase , > eosinophil 
major.basic protein J; (MBP) and tryptase. ; • The -HMC-l. cells 
have; howeveryc lost =the ability to/synthesize:,normalrigE 
5 receptors, -HMC-1 cells ^lso possess- 'a 10-16 translocation-, 
present in cells, initially collected by leukophoresissf rbm 
the patient and not- a* artifact of •- culturing. J Thus, HMC-1 
cells are a good -model-: for mast = eel Is. : o ■:> •.-;■> 
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~ ' 6 • 2 v CONSTRUC TION; OF hc DNA XIBBtPTBg r v, ; 

: ; For inter-library comparisons^ the libraries must be 
prepared in, similar manners../ Gertain'iparameters appear to 
be- particularly important to control, one such parameter 
is the method' of isolating mRNAV it is 'important to use 
the' same conditions to> remove DNA ,and heterogeneous nuclear 
15 RNA from comparison libraries. - . size -fractionation of a icDNA 
must be: carefully, controlled. The same. vector preferably 
should, be used .f or ..preparing libraries :to;,be compared ^ , At 
the very least, -the ; same, type, of vector: (e; gi ., 
unidirectional vector) should ibe used -to assure a valid 
20 comparison. A unidirectional vector may be preferred in 
order to more easily * analyze the ^output . ■ 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs. However, it is 
25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in. more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 
30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

It also is important that the clones be randomly 
sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 



35 



18 



WO 95/20681 



PCT/US9S/01160 



abundance v determined f vas^many as 100> 000 clones from a . m 
single ^library, may-need to be sampled. \n Size -fractionation 
of j cDNA ; also mustobepcaref ully 1 controlled;. -Alternately** 
plaques vcan be selected > rather than clones j o ■ : f & (v^ »dar ;c 
5 i ; Besides the Uni-ZAP™ vector system by Stratagene 
disclosed below y it Lis now believed -that other similarly 
unidirectional r vectors alsoncan, be used. ,^ For example, it- 
is, ^believedythat such vectors include> but -are t npt -limited^ ... 
to :DR2 (Clontech) , and, HXLOX (U.S. Biochemical). : , 

10 Preferably r the: details Q f library construction (as 

shown in Figure 1) are /collected ^and stored, in a database 
for- later retrieval relative; to the sequences being., ; ,* 
compared. Fig. 1 < shows* important ^ information regarding the 
library J collaborator- or cell or^cDNAt supplier, 

15 pretreatment / ibiolpgical sourcey < culture , mRNA, preparation 
- and cDNA constructions Similarly ^detailed information 
about the other steps is beneficial in analyzing; sequences : 
and libraries in depth. 1 r\ \ 

r ■ \. RNA must be harvested from cells and tissue samples 

20 and cDNA libraries are subsequently constructed, i cDNA 

libraries can be constructed according to techniques known 
in the art. (See, for example, Maniatis, T. et al. (1982) 
Molecular cloning^ Cold Spring Harbor Laboratory, New 
York) . cDNA libraries may also be purchased. The U-937 

25 cDNA library (catalog No. 937207) was obtained from 

Stratagene, Inc., 11099 M. Torrey Pines Rd., La Jolla, CA 
92037. 

The THP-l cDNA library was custom constructed by 
Stratagene from THP-l cells cultured 48 hours with 100 nm 

30 TPA and 4 hours with 1 /xg/ml LPS. The human mast cell HMC- 
1 cDNA library was also custom constructed by Stratagene 
from cultured HMC-1 cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

35 Essentially, all the libraries were prepared in the 

same manner. First, poly(A+)RNA (mRNA) was purified. For 
the U-937 and HMC-1 RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-l and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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hexamers, and .the two. cDNA libraries. were treated v - 
separately.. ..: Synthetic, adaptor oligonucleotides were 
ligated onto ,cDNA ends .enabling its insertion .into t the ? Uni 
Zap?,.yector,system. (Stratagene) „ ; allowing high efficiency o 
5 unidirectional;, (sense orientation), lambda library ; . 

construction and. the convenience of ,a plasmid system with. , . 
blue^white ; color : .selection to; , detect clones with cDNA ,. : - ■ 
jLnsectionsv, Finally, the twoTlibraries ; were, combined into 
a .single library.by mixing equal numbers of bacteriophage. 

The libraries ^can. : be screened with.: either. DNA. -probes 
or: antibody probes and the -p.Bluescript® phagemid ■ 
(Stratagene). can be ..rapidly excised ; in:v^ ^ho phagemid 
allows. : the. use , of a plasmid system ; f or easy^insert •.,■=.-. 
characterization, sequencing, .site^directed .mutagenesis, 
the cr^ation of .unidirectional deletions *md expression of 
fusion; proteins, , The ;.custon,-,constructed library phage - , 
particles .were infected into ..E. coli :h os.t- strain XLl.Blue® 
(Stratagene) , -which' has : a high trans formation efficiency^. >v 
increasing the probability of obtaining rare, .under- , 
20 represented .clones in the ;eDNA > library . - . .. . > ,. 

6 * 3 - ISOLATIO N OF entia GLOWTiR 

The phagemid. forms of individual .cDNA clones were 
obtained by the - in vivo excision process, in which the host 
bacterial strain was coinfected , with both the lambda 
25 library .phage and anfi helper phage. Proteins derived 

from.both,the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined .sequences on, the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 
that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced 
Because the phagemid carries the gene for beta-lactamase 
the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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Corp. , 2800 Woods Hollow Rd. / -Madison, ' WI 53711)u This 
small-scale process provides a^simple^and reliable method 
forlysing the bacterial -cells ; and rapidly isolating- 
purif ied phagemid DNA using 1 a proprietary DNA-bindihg' L 
5 resin. 1. The f, DNA'was fluted froin the°purificatibn resins i; 
already prepared 1 for DNA Sequencing and other analytical ? 
manipulations y — - ]T * ::.:'ur- u ■ v...,...; /, ,v V - f , , 

• e> Phagemid DNA was also purified usihg r the QZAwell w 8 
Piasmid Purification ^System f roih -QIAGEN® DNA Purification 

10' System (QIAGEN ' IricJ \ 9259* Etorv Ave*. ,-Chattsworth, CA 

91311) .' This product line-provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anioh-exchange resin-particles with' EMPORE^ 1 membrane 

15 technology from ; 3M in a multiwell formats t The DNA was 

eluted from the purification- resin already prepared for DNA 
sequencing and other 1 analytical manipulations. - 

s An alternate method of purifying phagemidshas recently 
become available.'' It utilizes -the Miniprep Kit (Catalog 

20* No. 7746B> available from Advanced Genetic Technologies 
Corp., 192 12 Orbit Drive, Gaithersburg, Maryland). This - 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit* is provided with a 
recommended protocol; which has been employed except for- : v 

25 the following changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth "with carbenicillin 
at 25 mg/L and glycerol 1 at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 ill of lysis buffer. A centrifugation step 

30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 

35 storage. 

Another new DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and -may be adaptable to the 96-well format. 
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' • : -'6./4. i SEQUENCING ;OF cDNA : CLOWES 

!J .; f , ! >The cDNA inserts, from -random isolates, of , the. U-937. and 
THP-l .libraries: .were sequenced . in. part , , . Methods for DNA " 
sequencing are*, well known in . the art. Conventional;; 
5 «n 2 ymatic..methods .employ ONAi 'polymerase Klenow. fragment, ..... 
Sequenase^. or.Taq. polymerase :..to,extend-PNA , chains from anA 
oligonucleotide primer annealed > to ^ the ; DNA template of 
interest . Methods ,have been, developed., .for the use *of both 
single- and double-stranded templates. - The . chain ■ : , , : ■ 
termination reaction wproducts^are* usually electrophoresed 
on urea^acrylamide. gels and are, detected ^either ..by? •: 
autoradiography (for. radionuclide-labeledi precursors). • or by 
fluorescence- (for fluorescent-labeled precursors), 1 Recent 
improvements in mechanized, react ion; preparation, -sequencing 
and analysis .using the fluorescent .detection method, have 
permitted expansion. dn : the number of ..sequences . that can be 
determined per day (such as the, Applied, Biosystems 373 and 
377 .DNA sequencer, ; Catalyst 800),.. . Currently with .the . 
systemas described, read lengths . range .from 250 to 400 
20 bases and. are -clone dependent. , Read length also varies 
with. the length . of time,. the , gel is.run. ^ In .general, the 
shorter runs tend to truncate the sequence, a minimum of 
only about 25 to 50 bases is necessary to establish .the 
identification and degree .of homology of. -the sequence. ■ • 
Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization,, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 
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6.5. HOMOLOGY SEARCHING OP cDNA CLONE AND 
DEDUCED PROTEIN tar>* fi^ bsemi^f 

Using the nucleotide sequences derived from the cDNA 
clones as query sequences (sequences of a Sequence 
Listing) , databases containing previously identified 
sequences are searched for areas of homology (similarity, . 
Examples of such databases include Genbank and EMBL. We 
next describe examples of two homology search algorithms 
that can be used, and then describe the subsequent 
computer-implemented steps to be performed in accordance 
with preferred embodiments of the invention. 
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In the following description of. the : computer- .. - ; . - 
implemented steps - of the . invention f the word ' "library'f r • 
denotes a fset (or population) >of ^biological specimen ; a 
nucleic acid sequences. A ^library", can consist *\ of <cDNA K 
5 sequences ^ viRNA isequences, or the like, whiph^ characterize a 
biological specimen. ; f ;The biological .specimen can consist 
of- cells of a single human -cell type (origan bp any, of the 
other above-mentioned * types ;of nspecimens,]b v ; . We .contemplate 
that; the sequences; in ,a . library have (been determined so as 
10' to accurately represent or characterize a: biological, , \ 
specimen (for example) they can consist of representative . 
cDNA /sequences;, from clones of RNA taken from a single human 
cell) . • r *\ irv L - - r . *jr .i< , : n. . ; . ... . 

. u - -In the following description of ;i the ^computer- - 
15.„ implemented ^ steps of the invention , the; expression 

"database" denotes a= set >of s stored data „ which -represent a 
collection: of sequences; -which* in I turn, represent ; a, , 
collection of biological reference materials. ^ For example, 
a database can consist of data ^representing many stored . {( 
20 cDNA sequences which are in turn representative of human 
cells infected with various, viruses, cells of r humane of :< 
various ages, cells from different mammalian species r . and 

SO On. :K ^ > . . ' i - ; 

r- In preferred : embodiments, the invention employs, a 
25 computer programmed with software (to be described) for 
performing the following steps: v ± 

(a) ; processing data indicative of a, library of cDNA 
sequences (generated as a .result of high-throughput cDNA 
sequencing or other method) to determine whether each. 

30 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 

35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 
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values in,the„library (Although- this, -can ^-be- done by r h uman 
hand, f rom- a iiP rintout.,of f all, entries ; ewe 'prefer , to ..perform 
thas .step using .computer, software- to.be described below ) 
thereby . generating a <set M of , final: dataovaluestor -abundance 
5 numbersit; . and,-, , ;■ \ 



.Tie.! . . : ,• trffJEja-,' 



..- .: , t tc) ;j -±f the: .libraries Aa.r« ,dif ferent.sizes^ rdividing 
each abundance number, , by s the total- number - of u sequences in 
- the.library^ to ^obtain a ^relative abundance r numb er .for each 
identified sequence =value { 4, e r>> a relative- aburtdance of ^u, 
10 each gene vtranscript).; ^ mcAiu^ ^..n^c > ,,^, 7si: 
- .The.list of ^identified ^equencervalues l(bf ^gehes 
corresponding ;thereto J: ,can ;then be sorted by abundance in 
the cDNA : population .- v a : multitude of additional , types uof 
comparisons or. dimensions , are, possible. in- u..v> . 
15 : For . example (to be described .below, inugreatern detail)^ 

steps .(a),, and (b) can be crepeated f or two-^differerit -urr, J-» 
libraries .(sometimes , referred .to .as a "target" library and 
a "subtractanfMibrary) .Then, fo* *ach, identified 1 
se q uence, V alue (or gene transcript) , ^ ■ ..ration valueVis hi , 
20 obtained by dividing the, abundance number ; ( f or that ; , - 
identified sequence value)., for the. target library, by the - 
abundance number .(for that identified sequence value) , for- 
the subtractant library.^;.; ^ , 

in, fact, subtraction may. be carried out on multiple 
25 libraries. It is possible to add the transcripts from • - 
several libraries ..(for sample, three, and then to divide 
them by another set of- transcripts from multiple libraries 
(again, for example, ..three):. Notation for this operation-- 
may^be abbreviated as (A + B + c) / (D+E+F) , where the capital 
letters each indicate an entire library. Optionally the 
abundance numbers of transcripts in the summed libraries 
may be, divided by the total sample size before subtraction. 

Unlike standard hybridization technology which permits 
a single; subtraction of two libraries, once one has 
Processed a set or library transcript sequences and stored 
them m the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratio values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 

second library and vice 0 versa * > i: < -v^. : r - 

1 In variations on step" (a) , the library consists of 
nucleotide^ sequences derived f rom cDNA clones; Examples of 
r 5 databases which can 1 be searched forr areas of homology ^s<:o 
(similarity) ;inrstep : {a)s include^ the- commercially available 
^databases known as : Genbank (NIH) EMBL (European Molecular 
•Biology Labs /? Germany) , and GENESEQ (Intelligenetics 1 , 
• MountainpView, ' California) c < ^. r , ;>- . : . , ( r 
10 f One homology search' algorithm which -can be used "to 

^implement step >( a) is the algorithm described in the paper 
by DiJ. Lipman and WvR. Pearson, entitled ^Rapid and 
^Sensitive Protein* Similarity Searches^" Science . 227:1435 
v (1985>v In this algorithm; the homologous regions are t\ 
15 searched in aptwo^step manners In the first step, the ^ 
highest homologous regions are determined toy calculating a 
matching score using; a homology score table. The parameter 
"Ktup" is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 
20 sets the number of bases that must match to extract the 
highest homologous region among the sequences . ; iri this 
step, no insertions or deletions are applied and the 
homology is. displayed as an initial (INIT) value; 

In the second step/ the homologous regions are aligned 
25 to obtain the highest matching score by inserting a gap -in 
order to add a probable deleted portion. The matching 
^score 1 obtained ixi the f irst step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 
30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., J± 
Mom> BjLol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 
35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosystems; Inc;, .(Foster City, Calif prnia) , 
including .the, lS of,tware. known, as ., the Factxara software (also 
available: from,. Applied : .Biosy stems. Inc.)* - The Eactura; ; , 
program: preprocesses. each . libraryt sequence to 'ledit. out?, 
3 P ort ipnS' thereof i-which are, not- .likely to be iof ....interest^,,' 
such as the ; vector. .used to prepare the^libraryv .Additional 
sequences which can; be edited out on masked- {ignored by the 
search tools,) include .but . are; not limited: to- the polyA, tail 
and , repetitive . GAG and CCC sequences., , A low-end . searchv • , 
program can..be;,written,.to .mask. .put,, such., "low^inf prmation" , 
sequences,, preprograms, such asw BLAST, can. ignore - the- low- 
information- sequences. -; • -.. 

■'•f ■> = In the, algorithm implemented, by. the -INHERIT 670a J.- 
■Sequence Analysis: System, j ther Pattern Specif icat ion-: 
Language .(developed : by TRW Inc.. ) is used to determine- 
regions of homology . ( . "There ; , are three ; parameters that 
determine.how INHERIT analysis runs -sequence comparisons: 
window size, .window off set and ; error, tolerance.,. .Window • 
.size, specifies the length ,of , the segments ..into : which: the : 
20 query .sequence is- subdivided. Window of f set. specif ies , 
where to start the next segment :[ to, be compared] , ] counting 
from the beginning of the previous segment. Error 
tolerance specif ies the total number of insertions, ,,...„• 
deletions, and/or substitutions that are tolerated, over the 
25 specified word length. Error tolerance may be set to any 
integer between 0 and -6. The default settings. are window 
tolerance=20., ; window offsets o and error tolerance-3. ••. . 
INHERIT Analysis U.sprs Mannal , pp. 2-15. Version 1.0; 
Applied Biosystems,. Inc., October 1991. 

Using .a combination of these three parameters, a 
database (such as a. DNA database) can be searched for 
sequences containing regions of homology and the ; 
appropriate sequences are scored with an initial value; 
Subsequently, these homologous regions are examined using 
dot matrix homology plots to determine regions of homology 
versus regions of repetition. Smith-Waterman alignments 
can be, used to display the results of the homology search 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT, include the BLAST, 
program; ■ GCG~ (available, from- the^ Genet ic& , Computer^ Group, 
WI)> and ; the ^ Dasher r program 1 (Temple Smithy. Boston j -?r i 1 . 
University, Boston, MA) . Nucleotide sequences can be ? 
5 searched against Genbank,; EMBL; or custom databases • such as 
GENESEQp (available from Intelligenetics, Mountain View, CA) 
or other databases for genes. - s In addition; / we havei - 
searched some, sequences .against^our town , in-house database. 
j.M.-, nf In- preferred embodiments, : thf f transcript^ sequences are 

10 analyzed by < the « INHERIT, , : software;, for - ; best . conformance with 
a reference gene transcript to assign a sequence identifier 
and, assigned the degree, of homology, ■ which together are the 
identified sequence value and are input into.,,, and further; . 
processed by,, a ^Macintosh personal computer .(available; ,$rom 

15 Apple) , programmed with an ^abundance sort v and subtraction 
analysis" computer program (to be, described below) c. w 

.Prior to the abundance sort and subtraction analysis 
program (also denoted as the " abundance sort" program),*, 
identified. sequences from the cDNA clones are, assigned - 

20 value (according to the parameters given above) by degree 
of match: according to the following categories: "exact" 
matches (regions with a high degree of identity), 
homologous human matches (regions of high similarity,, but 
hot "exact? 1 matches) , homologous non-human matches (regions 

25 of high similarity present in species other than human) , or 
non: matches, >(no signif icant regions of. homology to 
previously ^identified nucleotide sequences stored in the 
form of the database) • Alternately, the degree of match 
can be a numeric value as described below. 

30 With reference again to the step of identifying 

matches between reference sequences and. database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 

35 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initially, scored -for -homology using, a homology : score Table 

UOrcutt^B.C. . and Daypff, ,M.O Scoring. Matrices, .PIR 

^Report^MAT. - ,0285 (February. 1985) ) resulting : in an JNIT 
?score> .-The, homologous regions. arev.aligned to , pbtain .the 
,5 th4ghest;",natching...Bcores^ *y.Jin8erting. J a > ..gap,,which-!addsva , , - 
^probable, deleted portion. , . : « The matching score -is ':>-.,- v. 
r.recalculatedr using the homology, score -Table and the 
■.insertion, score. Table, resulting in .an. .optimized . (OPTM - 
score. „,Eyen t in.. the -absence ..of , knowledge of i the proper /. . ; : 
reading frame-of , an isolated sequence,., the ^above-described 
^protein homology search: may -be performed by ^searching aOM ■ 3 
"reading •.frames. >. j- .v - ;•. ; ., !; , ., : . - f ,., 

" " ' Peptide: and protein sequence homologies can also -be 
ascertained -using the, INHERIT ; 670 ..Sequence Analysis .System 
15 an; analogous : way to that used, in iDNA sequence • ,. 
homologies,., Pattern Specification Language and parameter 
windows, are used : to, search, protein databases, for sequences 
containing regions of . homology which are scored with an 
.initial .value.. Subsequent display, in a dot-matrix ..homology 
2 0 , plot shows regions of homology versus regions ;of ■ • . 

repetition. Additional .search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from ;Henikof f & Henikoff, University of £ - - - > 
Washington/Seattle),* Dasher and GCG., : Pattern -search 
25 databases include, -but are not limited to, Protein Blocks 
(available from Henikoff & Henikoff, University ,of.- 
Washington ,. Seattle) , Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA) ■ 
.BR08ITB (available from Amos Bairoch, University of Geneva 
Switzerland), ProDom (available from Temple Smith, Boston ' 
University), and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom). 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 
35 Biosystems, Inc., Foster City, CA) , can be employed to 

create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA feagments^into ...Assemblages, a 
special grouping of data where; the relationships * between; nc 
sequences are shown by gr apMfc • overlap , , alignment and 0"' 
statistical ; views . The process ><is based on the • 
5 Meyersr-Kececioglu model pf fragment 1 assembly (^INHERIT" 1 v.v* 
Assembler User Manual, Applied Biosystems; Inc. , Tbster. * 
City*; CA) y > and- uses graph theory as the ; foundation of al: 
very rigorous multiple * sequence alignment-; engine ,f or 
assembling DNA sequence fragments . Others assembly programs 
10 that can be used include MEGALIGN (available .from DNASTAR; 
Inc. , Madison, WI ) , Dasher and* STADEN, (available from. Roger 
Staden, Cambridge,*- England).. : , w ; i \ ...... ....... 

Next, with reference to Fig. 2, we describe in more 
detail: the "abundance; sortv program which, implements ^above- 
15 mentioned "step. :{b) « to ? tabulate the number-of sequences of 
the library which match each database entry (oth,e < V abundance 
number" for each database entry) .: : - , : , 

Fig. 2 is a flow chart of a preferred embodiment 1 of 
the abundance sort program* >A source -code listing of this 
20 embodiment of the abundance sort; program is .set forth in 

Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft ^Corporat ion. 
Although FoxBASE was the program chosen ^or the first 
25 iteration of this technology, it should -not be considered 
limiting. Many other programming languages/ Sybase being a 
particularly desirable alternative, ; can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 
30 subroutines listed in Table 5, 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 
35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream, of- ; . transcript, sequences indicative of the 
Biological-sequences, comprising the library. Programmed 
processor 4 . receives .the data stream .output, .from .unit, ,2 and 
processes ithis: data in accordance i with above^discussed 
5 *step (a) 9: toigenerate the Mdentified: > Sequences.i Processor 
4 .can be a iprocessor s programmed •■ with> the > commercially 
available computer, program known as the INHERIT 670 
Sequence Analysis System xand the commercially available 
computer .program known as rthe Factura program v{both ; . 
10 mailable, from Applied Biosystems Inc.) and with the UNIX 
operating system. 

c ^Still=jwith reference vto..Fig. -3 ^.the - Identified : .. 
Sequences ^fe 'loaded 'into- 

with the abundance sort program. Processor 6 generates the 
15 Final ^Transcript sequehce 1 ^ 5 indicated ^irt'bbth 'Pigs'.' 1 2-arid ! '3. 
Fig. 4 shows a more detailed block diagram of a planned- 
relational computer system, including 1 Paribus searching ^ 
techniques which can be implemented, along with an ' ■ ; - 
assortment of databases to query "against ".' ' ^ ■' > ! 
JO 'With reference to Fig.- 2, the abundance soft program- 1 
first performs an operation "known as "Temphuin" oh the 1 M 5 
Identified Sequences, to discard all of the Identified" 
Sequences except those which match' database entries of :j '-< 
selected types. For example, the Tempnum process can' ' 
select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition): "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in ' 
species other than human) , "no" matches (no significant 
regions of homology with database entries representing 
previously identified nucleotide sequences) , matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database). This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 
35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example > discard < a 11 ident i f ied sequence ' values '<>'•' 
representing matches with selected database entries, ; ^\\. 

The identified sequence values selected during ^the 
"Tempred" process are then classified accofding^to library, 
5 during* the "Tempdesig" ^operation.- It isis ^contemplated -that 
the "Identified 'Sequences" can represent sequences from a 
single library , or from two * or more libraries*; - > r 

' :^C6nsider : t first the case ;that the iidentif ied sequence 
values represent * sequences from a single library. In this 
10 case, 'all the identified sequence values determined during 
"Tempted" undergo sorting in the "Templib" operation , 
further sorting in ^ the- "Libsort" operatic**; and finally : * 
additional" sorting 'in the : "Tempt arsort" operation ..» For L ' ■ 
example/ these cthree sorting operations can sort the 
15 identified sequences in order of decreasing ^'abundance i ; 
number" (to generate a list of decreasing abundance- ^ 
numbers, each abundance number corresponding 'to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 
20 corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list ^ 1 In this " 
case, the operation identified as "Cruncher" ^can be : 
bypassed, so that the "Final Data" values ^are the organized 
transcript sequences produced during the " Tempt ar sort " , 
25 operation. 

We next consider the case that ^the transcript - 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 1 
"target" library and the "subtractant" ^librairy) For 
30 example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may -consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 
35 sequences from clones of a cell type from a young human, 

while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this : 'c*sey the ; ;^Tei^desi ; g»-^ratitin%oirtes^arr 
transcript se^endes'repre^n^ 

processing in accordance With' "Teimplit," (and ^lien "Litisort" 
ana ••Temptarsbrt" ) ) ' and routes" ail transcript ^ecjue'hces' Uw 
5 ^ e P^® nt ^9 thesub'tractant' library for ; pr bees sing in' ,! 
accordance' with "Tempsub" 1 (and'theh "Subsort^ arid ' 
"Teipsubsort") I 1 For : exain^Ie, the " consecutive' "Te^i-ib', » 
"^f € t " s<5i-ti^ operations 5 sort" 

.... i ^ n * iil ®l.? e ^c« 8 target' Tibrary in order of "'" 

10 decreasing abundance numbW (to ; generate a'lis^ of " 
decreasing abundance numbers, each abundance number 
corresponding ' to /4 'database entry ; lf or several- lis& W : ' 
decreasing abundance ' numbers ; with ^ ^ C a^hdanc^ ; numbers in 
. each list corresponding to database entries of a selected ' 
15 type) with redundancies eiiminated-frbm eWs^rted "li£t. 
• The consecutive -Tempsubr-i "Subsbrt, » and ' ••Tempsubsort"' 
sorting operations sort' identified sequences from the ' f "" } 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance 'numbers, each 
20 abundance number corresponding to a database entry'; or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with Redundancies Eliminated 
from each sorted list. M " 1 ■ 

25 * he tr anscript sequences output from the .•Temptarsort" 

operation typically represent sorted lists from* which a"'"' 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along 5 another 
30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsorf 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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: The transcript sequences (sorted lists) output 'from ' ; . 
the Tempsubsort and 1 Temptarsort ' sorting operations are i 
combined duririg the operation identif ied as "Crunchers" 
The "Cruncher" process identic ies pairs of ^corresponding 
5"' target arid subtractarit 1 abundance numbers (both representing 
the same identified sequence va£ue|, and divides *me by the 
other td generate aK" ratio" value "for each paiir rof - 
corresponding abundance ^umbeirs^ and then sorts the ratio 
values in order b"f decreasing ratio value. ^ The data output 

10 from the "Cruncher" operation '(the? Final -Transcript * c 

sequence' in Tig : 2) is typically a sorted ylist- from which a 
histogram could be generated ih» which -position along one ; 
axis indicates the size of a ratio 1 of ftabundance: numbers 
(for corresponding identified ^sequence values .-from target 

15 and subtractant libraries) and position along another axis 
indicates identified sequence value (e.g;/ gene type)^ 

c? Preferably, 'prior to 'obtaining a ratio between. the two 
library abundance values, the Cruncher operation also a 
divides each ratio value 1 by the total number of sequences ; 

20 in one or both of the target and subtractant libraries. *n0 
The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical, - 
scientific, and industrial applications; Also preferably, 
the output of the Cruncher operation is a set of lists, 

25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries,- ^ - ^ :v 

In one example, the abundance sort program of the : 
invention tabulates for a library the numbers of mRNA 

30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 

35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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..; Table,.2 is an abundance; table; listing , the >; various gene 
transcripts , in an .induced -.HUVEC. library. r The transcripts,.. 
,are -listed in prder ri of .decreasing .abundance^c This 
'computerized sorting simplifies analysis; of ;.,fche tissue * and 
cspeeds identification ,of significant -new proteins- which are 
.specific to this.. cell-: type. : This., type.; : -of endothelial cell 
alines tissues of. the. cardiovascular system > : and the: more 
.that is ^own r about ,itsr, composition, ,i. particularly, in iL ; 
response .to activation,^ the - more choices. ;of -> protein targets 
become available ,to, affect. ; . in .treating, disorders of this 
tissue.,, such as ,the. -highly -prevalent atherosclerosis . f 

.6.7., flONOCYTE-CELL AND Mas^ -CELI. eny m LIBRARIES ;i r , 

Tables , 3 and 4 show, , truncated comparisons of two 
15 libraries. In Tables 3 and 4 the "normal monocytes" are 
the HMC-l cells, and the, ''activated -macrophages" are the 
THP-l . cells, pretreated with PMA and, activated with-LPSv: ^ 
Table 3 lists, in descending, order of abundance . , the. most 
abundant, gene r transcripts for , both* cell - types . , .\ .With, only 
20 15 gene transcripts from each cell, type , ■ this : ; table permits 
quick, qualitative comparison of, the most- common 
transcripts. : This abundance sort, with: its -convenient 
side-by-side display, provides an. immediately, useful 
research tool, in this example, this research tool 
25 discloses that l) only one of the top. 15, activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly A binding protein) ; and 2) 
a. new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 
macrophages but is not similarly prominent in normal 
macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 
35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented' *as steady state mRNA 'are quickly eliminated' 
from -further characterization. 

This illustrates how the gene transcript profiles 
change With altered cel^^ '^ds'B^i^^^^JdB, 
5 the art" know that tiie biofcti&tiical ^cbmpositiori-of :ciells ; also 
changes' :with other functional changes such 'as caiic^r ; 
including- 1 cancer ' s various stages, and exposure to ■ 
toxicity; 1 A gene transcript subtraction prof ile such as in 
Table 3 is" useful as a first screening tool for such gene 
10 expressidh ahd prbteih studies." 1 o . r 1; ^ ; , 

: - i -Of - : fj : -r: O- l-ir .- 

6.8. SUBTRACTION ANALYSIS OF NORMAL MONOCYTE -CELL AND 
- l< r ACTIVATED MONOCYTE CELL cDNA LIBRARIES 

Once-.the ,cDNA dataware iiv the computer,., the computer 

prpgramwas .disclosed. ,in,Table 5 was used r -to obtain. ..ratios ; 

15 of all the genQ transcripts, iij the two .libraries discussed 
in Example f :6.7i/; and ; the, ( gene transcripts . were ^sorted by the 
descending -values ;o£ their , ratios.,,, . If , a gene transcript is 
not represented in one library, that gene transcript's ~ , 
abundance is unknown but appears to be less than, 1. ,As an 

20 approximation : — and to obtain ja. ratio, , which would ^npt be 
possible, if the unrepresented gene were given an abundance 
of zero — r genes which are, represented in only one of the 
two libraries are assigned an abundance of 1/2. Using 1/2 
for unrepresented, clones increases the relative importance 

25 of "turnedron" and "turned-of f " genes, whose products would 
be drug candidates. The resulting, printrout ; is called a 
subtraction table and is an extremely valuable screening 
method, as is shown by the following , data*. e 

i Table 4 is a subtraction table, in which the normal 

30 monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 

35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 
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research valuable -time-: -and laboratory, aresources at the 
Learly .discovery- .stage and can speed up the drug development 
cycle, } . which ,in-|. turn .permits -researchers /to- set ; up- drug .«.- 
■screening. jprograms ..much, .eariier : , Shus, v . this, research tool 
;5 provides . a. way -to get new, drugs \to ,the..public : f aster and .0 
imore.ueconomically.iM" Ur.; r ■>.".•"•,; j .■>■■■ »}..■ r, •.•>.••- i-.. 

; i . • Also , ! isuch a- /subtraction table can _ be obtained for 
patient diagnosis . .. . An individual: > patient - sample (such as 
monocytes Obtained .from-a biopsy ■ or, blood sample)- can be 
10 Qompared with-, data, provided . herein to /diagnose : condit ions 
associated -with :«macrophage^ activation - ■< ? v ,.< r- ; .; . 

; Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
25 technique offers a detailed picture of upregulated 

transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 
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6.9. SUBTRACTION ANALYSIS OF NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER X2ELL cDNA LIBRARIES 

In this example,.' rats are exposed to hepatitis virus 

and maintained in thef : colony until ;i:tiey Show definite signs 

5 of hepat'itis 1 J , . 6f . the rats ; diagnoseid^ with hepatitis , one 

half of v the ratsi are treated with; a new ^antd -hepatitis 

agent (AHA) . Tiiver samples lare obtained from all rats 

before exposure tq> ttve hispatitiis ivirus and at- the end of 

AHA treatment or rio -treatment i : In addltlbny liver samples 

10 can be obtained from rats with hepatitis, just prior to AHA 

treatment. ; 

:: The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of fche<cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated stiate of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile, for all control 
samples, all samples from infected rats and all samples 
from AHA- treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood t&s%s for the usual enzymes. The 
gene transcript profile and subtraction provides a much 
-^or-e/comp-tex biochemical picture which researchers have 
vt . 5,, oneeded to..analyze such difficult problems. 

Second,' the subtraction tables provide a tool for 
'identifying' clinical liarkers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 
10 due to the drug, and even additional pathology due to the 
:*#r?g. Tihe subtraction^tables specifically highlight genes 
yj&fab are. turned oh- ( "or '^&f f . Thus^the subtraction tables 
J;prpvide a first screen for a set of ^gene transcript 
-/candidates for use^asj;clinical markers. "Subsequently, 
15 , electronic subtractions of additional cell. -and tissue 

libraries reveal which^ of the potential barkers are in fact 
■f ound irr dif f erenfc cjeli; and tissue; ;;i^ra£ie&;., Candidate 
?gene transcripts found; in additional : libraries are removed 
'from the; set of potential clinical-markers;, ; Then, tests of 
20 blood orT other relevant .samples which/.«re> known to lack and 
Jiave the. relevant condition are compared to validate the 
selection of the clinical marker. :in this method, the 
particular physiologic function of the protein transcript 
need not.be determined to qualify the gene transcript as a 
25 clinical marker. - 

6 . 10 . ELECTRONIC NORTHERN BLOT 

One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 

30 relevant to further study (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues. In the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 

35 probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the pattern of expression of that particular gene, one at a 
time, can be quantitated in all the included samples. 

In contrast, a further embodiment of this invention is. 
"the computerized form of this process; termed here 
15 "electronic northern blot . 11 \ In this variation, a single 
y gene is queried fqr .expression against, a Multitude of 
? 'prepared and sequence^ ' libraries present .within the 

database. In thisjLwayV 1 the pattern kit expression of any 
single candidate gene 1 can be examined -instantaneously and 
10 effortlessly. More -candidate genes can thus be scanned, 
> leading to more frequent and fruitfully r^ieyiant t ' 

^discoveries . The ^pinputer program : jiriciuded^as Table 5 
includes a program" for performing thas^ function, and Table 
j-6 is a partial listing of entries of the database used in 
,15 rthe electronic northern blot analysis.^ : itt , 

.7, ;; r V l 6 . ii , phase i clinical trials 

' Based on the establishment of safety ! and effectiveness 

in" the above animal? tests, Phase I^clinical tests are 
undertaken. Normal patients are subjected to the usual 

120 preliminary clinical laboratory tests Ih addition, 
appropriate specimens are taken an& subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 

,25 .described above. In addition, the!,gene transcript changes 
noted in the earlier rat, toxicity study are -carefully 
evaluated as clinical markers in the followed patients. 

'■< Changes in the gene transcript analyses are evaluated as 

indicators of toxicity by correlation with clinical signs 
30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 
35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. r. 

6 • 12 . GENE TRANSCRIPT IMAGING ANALYSIS IN CLINICAL STUDIES 
A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is r a useful tool^ih other 

clxnical studies. For example, Cthe Sifef erencesfdn gene 
IJ ; transcript imaging ahalyses^bef ore and ?af tdr treatment can 
^ be assessed for patients !on^lacebo an& drug treatment. 
TSis method also effectively scteelns^f oVrzcrinl^ai^iarkers 

lp ;; / to follow in clinical- use; of th^dru*.^ ^ J- ?. / ^ 

v " • ■ ^ h H£ 4: 0 h T I 

> * • 13 • COMPARA TIVE GENE TRANSCRIPT ANALYSIS BETWEEN SPECIES 

: v - Ti^et subtraction fm^hbd^can ; bef used* to §creen\cDNA 
X:" libraries from diverse -sources I For i4xalnple , J^tfte Jsame cell 
y types from different ^species; cart be compared by /gene 
15 .transcript analysis^ to jscyeen \f dr specific idif fer^nces, 
\ such as in detoxification Enzyme systems f Such ?r testing 
ai ^ s in r the selection and validation of an animal model for 
the commercial purpose of drug screening jor toxicological 
testing of drugs intended for human, or animal. use. When! 
20 the comparison between animals of different species is :v 
shown in columns for each species, ^we refer to' this as an 
interspeqies comparison, or zoo blot. 

Embodiments o^: this invention may employ databases 
such as those writtenpusing the FoxBASE programming 
25 language commercially; available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 
30 December 14, 1993 to Cull, et al., pct International 

Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April l, 1993, or PCT International 
Application Publication No. WO 9119818, published December 
35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 

the present invention, r^hi 

All references referred to in the preceding text are 
^"!., hereb^^expressly incorporated by reference herein. 
*jos? ^/a T ribus modifications and variations of the described 

method and system of the invention will be apparent to 

those Skilled in the art without departing from the scope 
'and spirit of the invention. Although the "invention has 
;; ; ;:;*',* been described in connection with specific preferred- , ! 
'•ltT embodiments, it -should be ^understood that the invention as 
... ('/• claimed should not be unduly limited to such specific ^ 

embodiments. 
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Clone numbers 15000 through 20000 
Libraries : • HUVEC -5» • *\ ^xCT'-': 1 • c .-,v ; 
Arranged "by ABUNDANCE 
Total' clones analyzed: 5000 

319 genes, - for a' total of 1713 Clones 





r • V\ ' t , 










U£" ■ ' 5 - *Ii ' ' ' 






























^number 


N 


c 


entry 


1 


.15365 


67 




;;hsrpl41 


2 


,15004 


65 


''A i 'Ji*..-v. ■ : 


?NCY015004 


3 


cl5638 


63 


: }\) c>v 


-NCY015638 


4 


715390 


50 




" : WCY015390 


5 


:15193 


47 




:hsfibi 


6 


25220 vj 


- 47 




RRRPL9 


7 


'15280 






NCY015280 


8 


;l5583 v 


,-■ - ■ w 




H62060 


9 


15662 : ; 






HSACTCGR 


10 


ri'502 6 tl 


'.;*29 




NCY015026 


11 


■A 52 79 ' 


v "^-24 




I30£*r inn 


12 


15027- 


23 




NCY015027 


13 


15033 v , 


. 20> 




NCY015033 


14 


15198 


20 - 






15 


15809 


■ ''SSO^ 






16 


15221 


19 


■*. '' '. % c\ 


NCY015221 


17 


15263 


19 




NCY0152fi3 


18 


15290. 


«^ 19 






19 


15350 


; - 181: 






20 


15030y * 


.v i!7;^ 


* 1 " J , 


* NCYOl 5O30 


21 


15234 


*17r 






22 


15459 1 


. : 16^ 


5 • ' 


NCY015459 * 


23 


15353 ^ 




r r..-tV 


NCY015353 


24 


15378 


.; 15'* 




S76965 


25 


15255 


14 


* ' ■"' \ ; * 


HUMTHYB4 : 


26 


15401 ,/ 


J .14': 




HSLIPCR 


27 


15425 


• 14: 




. HSPOLYAB 


28 


18212 


14 




•HUMTHYMA • 


29 


18216 : 


14 :• 


':■_''!■■ ■ -> 


HSMRP1 - 


30 


15189 


13 1 




f HS18D - 


31 


.15031 


. 12 




HUMFKBP ; 


32 


15306 


12 # 




HSH2AZ - 


33 


15621 


12 




HUMLEC ■' 


34 


15789 


11 




NCY015789 


35 


16578 


11 




HSRPS11 


36 


16632 


11 




M61984 t 


37 


18314 


11 




NCY018314 


38 


15367 


10 




NCY015367 


39 


15415 


10 




HSIFNIN1 


40 


15633 


10 




HSLDHAR 


41 


15813 


10 




CHKNMHCB 


42 


18210 


10 




NCY018210 


43 


18233 


10 




HSRPII140 


44 


18996 


10 




NCY018996 


45 


15088 


9 




HUMFERL 


46 


15714 . 


9 




NCY015714 


47 


15720 


9 




NCY015720 


48 


15863 


9 




NCY015863 


49 


16121 


9 




HSET 


50 


18252 


9 




NCY018252 


51 


15351 


8 




HUMALBP 


52 


15370 


8 




NCY015370 



descriptor 

Riboptn L41 

INCYTE 015004 

INCYTE 015638 

INCYTE 015390 

Fibronectin 

Riboptn L9 

INCYTE 015280 

EST HHCH09 (IGR) 

Act in f gamma . 

INCYTE 015026 

Elf 1-alpha 

INCYTE 015027 

INCYTE 015033 

INCYTE 015198 

Collagenase 

INCYTE 015221 

INCYTE 015263 - > , 

INCYTE 015290 

INCYTE 015350 „ ; ■ -- 

INCYTE. ' 015030 • ^ ■' * i \'/'/ 

INCYTE - 015234 - ^ 
INCYTE : 015459 ' 
INCYTE . '015353 

,Ptn kinase inhib v : 

. Thymosin beta-4 -: 
, 'Lippcortin I : i - ; 

Polyr-A bp * 
Thymosin, alpha 
; J Motility relat ptn; MRP-l;CD-9 
^ x >. Interferon indue ptn 1-8D 
FK506 bp 
Histone H2A 
Lectin, B-galbp, 14kDa 
. INCYTE 015789 
Riboptn Sll 
EST HHCA13 (IGR) 
INCYTE 018314 
INCYTE 015367 
interferon indue mRNA 
Lactate dehydrogenase 
C Myosin heavy chain B 
INCYTE 018210 
RNA polymerase II 
INCYTE 018996 
Ferritin, light chain 
INCYTE 015714 
INCYTE 015720 
INCYTE 015863 
Endothelin 
INCYTE 018252 
Lipid bp, adipocyte 
INCYTE 015370 



43 



WO95/20681 



PCT/US95/01160 



TABLE 2 Con't 



syi numbers; 



c.vA ; -entry 



53 


' ^15670 


8 


BTCIASHI 


V 


54 


^.:15795^; t 




NCY015795 




55 


£vi62*5 : T * : 

18262 \ . 
^,18321. r 


'-8 


NCY016245 




56 




r. Wi . NCY018262. * 




57 


* '* 8 " 


HSRPL17 " 




58 


00^5126^. 


7 


XLRPL1BRF 




59 


CU15133^^ 


,7 «. - 


* u*.iHSACQ7. t - 


" . i ! 


60 


VM5245>.- 


7 


NCY015245 




61 


15288 


7 


NCY015288 




62 


15294. y i 




Al HSGAPDR 




63 


£..15442 r*' :s 


^ 7 


HUMLAMB 




64 


Cfl5485 


. 7 4 


HSNGMRNA 




65 


^ r 16646^:vr; 


- 7 • * 


NCY016646 




66 


vl8003~- 


7 


HUMPAIA 




67 


.-15032 » - 


V-'-6^:- *. 


V: ;i -iHUMUB • 




68 


^15267 


6 


HSRPS8 




69 


.15295.- ; 


6 


. NCY015295 




70 


15458" 




■ RNRPS10R 


R 


71 


15832. 


6 


RSGALEM 


R 


72 


15928 


, ...6... 


. HUMAPOJ 




73 


; 16598 


6 


HUMTBBM40 




74 


18218 


6 


NCY018218 




75 


,18499 


' 6 


: "HSP27 




76 


'18963 


6 


NCY018963 




77 


18997 


6 


NCY018997 




78; 


15432 


5 


H SAG ALAR 




79 


15475' 


" 5 


NCY015475 




80 


,15721 _ 


, .5..... 


NCY015721 




81; 


15865' 


5 


NCY015865 




82 


i 16270 , 


.5 .., ■ 


NCY01627O 




83 


16886 : 


5 


NCY016886 




84 


-18500 


' * .5 • 


•i NCY018500 




85 


18503 


5 


NCY018503 




86 


19672;; : 


5 


RRRPL34 


R 


87 


M5086 


4' 


XLRPL1AR 


•F 


88 


15113 


4 


HUMIFNWRS 




89 


152>42 


4 


NCY015242 




90 


. 15249 : 


4 


NCY015249 




91; 


^ 15377;. rv 


...4 


NCY015377 




92 


15407^ 


4 


NCY015407 




93 


15473 


* ' •• -' 4 - • 


NCY015473 




94 


1 15588 . 


4 


HSRPS12 




95 


15684 


4 


HSEF1G 




96 


15782 


4 


NCY015782 




97 


/ 15916 V 


4 


HSRPS18 




98 


15930 


4 


NCY015930 




99 


■'■ 16108 


4 


NCY016108 




100 


'=* 16133 


4 


NCY016133 





descriptor 

NADH-ubiq oxidoreductase 

INCYTE 015795 

INCYTE 016245 

INCYTE ,,018262. . 
** Riboptn L17 " 

Riboptn LI 
efcctin, beta 

INCYTE 015245 

INCYTE 015288 

G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiquitin 
Riboptn S8 
INCYTE 015295 
Riboptn S10 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn L34 
Riboptn Lla 
tRNA synthetase, 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf 1-gamma 
INCYTE 015782 
Riboptn SI 8 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 
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TABLE 4 



Libraries ;f^THF-l 
Subtracting: HMC 
Sorted by ABUNDANCE 
Total,' clones analyzed: 



7375 



1057; genes f 
numbers 

10022 - 

10036 ^ 
10089 J 

ioq$q^-;: 
iooo3 
10689 V 

11050 " 
10937- ^ 
10176 ^< 
10886 
10186 * 
10967 - 
11353 V' 
10298 
10215 : 
10276,. 
10488 
11138 . 

10037 ; 
10840- 
10672 

12837 

10001 - 
10005 
10294 
10297 
10403 
10699 : 
10966 . - ; 
12092 
12549 
10691 
12106 
10194 

10479* ~ 
10031 
10203 
10288 
10372 
10471 

10484 * • 
10859 
10890 

11511. 

11868 
12820 
10133 
10516 
11063 
11140 
10788 
10033 
10035 
10084 
10236 
10383 



^for. a jtotal- of ^,2151, clones 
■v. entry 1 s ov >jb ^escriprtorr..;; 



HUMIL1 - ? 1 " IL '1-beta • ' v.- ^ v ; 3 o 

HSMDNCF _ IL-8 

-HSLAG1CDN Lymphocyte activ gene 
; r HUMTCSM -. v< RANTES 

HUMMIP1A" 1 ' MIP-1 
HSOP Osteopontin 
NCY011050 INCYTE 011050 
HSTNFR TNF-alpha 
HSSOD ' v Superoxide dismutase 
HSCDW40 B-cell activ, NGF-relat 
HUMAPR Early resp PMA-induc 

HUMGDN PN-1, glial-deriv 

NCY011353 INCYTE 011353 
NCY010298 INCYTE 010298 
HUM4COLA Collagenase, type IV 
NCY010276 INCYTE 010276 
NCY010488 INCYTE 010488 
NCY011138 INCYTE 011138 

HUMCAPPRO Adenylate cyclase 

HUMADCY, . Adenylate cyclase 

HSCD44E Cell adhesion glptn 

HUMCYCLOX Cyclooxygenase-2 

-NCY010001 INCYTE 010001 

NCY010005 INCYTE 010005 

NCY010294 INCYTE 010294 

NCY010297 INCYTE 010297 

NCY010403 INCYTE 010403 

NCY01O699 INCYTE 010699 

NCY010966 INCYTE 010966 

NCY012092 INCYTE 012092 

HSRHOB Oncogene rho 

HUMARF1BA ADP-ribosylation fctr 

HSADSS. Adenylosuccinate synthetase 

HSCATHL • Cathepsin L 
CLMCYCA I Cyclin A 

NCY010031 : INCYTE 010031 

NCY010203 INCYTE 010203 

NCY010288 INCYTE 010288 

NCY010372 INCYTE 010372 

NCY010471 INCYTE 010471 

NCY010484 INCYTE 010484 

NCY010859 INCYTE 010859 

NCY010890 INCYTE 010890 

NCY011511 . INCYTE 011511 

NCY011868 INCYTE 011868 

NCY012820 INCYTE 012820 

HSI1RAP IL-1 antagonist 

HUMP2A Phosphatase, regul 2 A 

HUMB94 TNF-induc response 

HSHB15RNA HB15 gene; new Ig 

NCY001713 INCYTE 001713 

NCY010033 INCYTE 010033 

NCY010035 INCYTE 010035 

NCY010084 INCYTE 010084 

NCY010236 INCYTE 010236 

NCY010383 INCYTE 010383 
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' X " TABLE 4 Con't 



number^.. . T? ., entry ^ . , s descriptor bgfreq rfend ratio 
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?* K^wvr^^v/-'.^: ' TABLE g 

* " ■ — r t J ' * • - 
, . ^""r ^ 1 >' ". - i' * 

* Master stemi for SUBTRACTION output 
fiBTrTAUC^OFF -..-*.. i.-. . v , ^~ tV . - 1'- * i ^ 

SET SAFETY OFF *.v - \ r, ju-. 

SET EXACT- C&T ' *. : < *v > ; ■. .t > - . . 

t 6BT TYFEAHEAD TO^nr-v* - v-- - - 

SLEAR i" " " • '* c ' : " : 'V : ~^ r - 

SET , DEVICE TO SCREEN 

USB;*Smart^uyiP«^ASE+/lfiac;fox files i Clones ,dbf ■ 

go TO r irc "r---- ; : -i-c - • 
jSi pRg; Kta fflSR- TO: bcttiaie 

. CO BOT TC H-'- ^ ' J 

STORE NUMBER T0*3ERKXHA3E 

; ^ ? ™ ----- — , 

(gTORE^^V, : ;> > TOTarget3 " ■ 
.STORE; 1 w ~ o ■ ■ TO Object 1 
t groftB ^' - - ^ MP, Objecta . 
gTQRB 1 1 fO 0bject3 

STORE . 0 TO ANA L ' • 
STORE 0 ! TO BATCH 
ESjpR E, 0 TO HHA3OT 
STORE 0 TO CKATCH 

■ tflUHK 0; TO BATCH r r , , , 

■ SUUKB 0 TO JTP ' 3 * '■' " : ' " '- K - : ' -* ■'■*.-''" 1 '^5;'/ l\t ;.. ' C0<£v « 

•flTCREl TO BAZL 

DO WHILE .Ti ' rl 

* TKogBm.j -Subtraction 2.frot 

'* Date. ,10/11/94 „ •* ( ' iyf ''' ^s- • r >-.- v-,*- 1 -.- «.v^^ - 

^^.VnionVi/MSftSBfyibGpvia^ioA'l.lO . 

Notes >...t Foaaiab file Subtraction 2 i!-. '; vi^;* 

.* " ■ - . - ' ".v ; .?-r /\.\ r-^ / } --j: ■ 

SOJEEN 1 WPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS l PC«r -Geneva-, 9 COLOR 0*0 0 

* FBffiM -75,120 TO.178,241 STOS 3871 COLOR 0,0, -1,24610 1 -1,8947 ' "^'^ 




8 .PIXELS *198;i26^GET PTP f ST3CLE 65536 TOOT -Chitfago<,12 PICTURE •8*C Print to file- ctze 15' S 
••"BW 90,9 TO 181,109 STH£.3871 COLOR 0,0,-1%25WO/S*T - - - mf ' #9 
4 PIXELS 90,28'8 TO*181,397 STYLE 3871 COLOR 0,0, -1, -25600,-1,-1 ' 

8 PgCELS .81,296 SAY 'Background:* STYLE 65536 PCNT •Geneva", 270 COLOR 0,0,-1.-1,-1.-1 

i ™5£ i S 'P 5 ^ ANAL 65536 tOT •ChicagoM2 PICTURE «8^R Overall i Funct icn" * SIZE 4 

q PIXELS 81,26 SAY "Target:" STYLE 65536 PONT -Geneva-, 270 C^ 0?0^^, -lT-1 

8 PIXELS ,108,20 GET targetl STYLE 0 PCNT "Geneva-,9 SIZE 12,79 COLOR 0.0.-1.-1 -1 -1 

? PSflSLS 135,20 GET target2 STYLE 0 PCNT "Geaeva*,9 SIZE 12,79 COLOR 0.o!-l!-l -1 -1 

8 .PIXELS 162,20 GET targets STYLE 0 PCNT «Oen«va« ¥ 9 SIZE 12 79 SlOR olo'-l 

8- PIXELS 108,299 GET Objectl STYLE 0 FONT f Geneva*, 9 SIZE^^-COLOR 0#6, -1,-1.-1* -»1 

8 PIXELS 135,299 GET objeot2 STYLE 0. FOOT "Geneva- 9 SIZE 12 79 OMR 0^0 -1 -1 -1 -1 

8 PIXELS 162,299 GET object 3 STYLE 0 FCMT -Geneva- 9 SIZE 12 79 SlOR 0 0 -1-1-1-1 

8 PIXELS 276,324'GBT Bail STYIfi 6S536 FOOT -Chicago', 12 PICTURE " "8*R Run ; Bail oit-SIZE 4112 



BOP* Subtraction. 2. fiat , .W.'.'' ' 

READ * 
IF Baile2 
CLEAR 

CLOSE QATABASB9 

USE " Brart Ouy;FoxBASB+/Macs£ox files ! clones. dbf 
.SE T 8A FETT OK 
SCREE N. 1 OFF 
RETORM 
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sroiffi. uCTgR(tttrgeUt TO ttufretx. v'-.^v?™ '0^*;.,-..*.' 
STORE. UFPERJTargetS) TO Tfcrget2 * -< -i-'" .- u *'^' * ^ \ -.y^. e jy* r . , . 
STORE. UPPER (Ttoget3) TO Ifcr9et3 t* - - . . > - ♦ 

STORE UPPER (Ob jeotl ) TQ . Qbj ectl *" 1 ■■ l - t -'---- -^v *v..^, , V) .v 

STOR£,UPFER<Object3) 4 TO O)?ject2 

SOT-flJffJC ON _ . r , . fc s . . . 

GO IM TO^T E , t ' " ; 

COUNT TO TOT^ . , < . « 

COPY /TO TTMPRED FOR Da 1 12 ' .Cit.DsVOr'.OR^^ 1 ; ' " H ' ^ . i.occ" 
USE TEKFRED ■ - 

IF EtaatcteO ?.AND. , BnaatdhBo" toatchob .and. jmktch=0 ^ r i r - * ^ > : ^ 

ELSE" • *•■*.*"■'• ><-.'. °* ' ^< 'JU.-f „ -ri-: : Vv- rr i...L.^-..'. 

OOFY B TKOCTO RE TO TEMPPBSIG - r '"' " - ' v "/ ' h- >: - . ' vj..-. , -^u 

.USB 7EMPDESIG 

ZranatdhBi; " v ' c ' ' • ■ 'i.- vt'^-- 

APPEND FROM TQIENCM FOR Ds'B' 

EHDZF '.' " — l."5j : >'Y -i, :,. '* v „, . .. 

ir'Hmatchal : n-. .xt?*.-... fi > - - , 4r -, 

APPEND FROM TEMENUM POR P^'K' 

ENDIF : V J> ■. ' J ' J " '■ . ' f"'"^ ): l) ■■"•!■ . r<. «V ■■•i^.- ■ >f,.. -<t,-v. 

I P bn tttchsl ' ' I 

APPEND FR&Bi'TEHPNUM POR Da',0'7. 'vi'""* v -i'^ >' r>1 ^ - :..v, ■ V-/- - 

IPJteatchal 

APPEND PTOtf TEMEKtW FOR Db'I'.DR.Db'X* " " i- - 0 " - ■ , ; , 

*.. OR,Do 'N* v 



COTKT TO SXAKTOT . . 

COPY STRUCTURE TO TEMPLIB 

USB TEffLIB • • " 

APPEND FROM KEHPDESXQ FOR libraoymUPPBR (taTOOtl > 

IP borgefe^o 1 . • 

APPB3D"FRpM •PEMPEESM FOR library =UPPER( targets ) 



IP targets^ 1 '» T"'. '.r. 

APPEND FROW. TEMPDESIG FOR lihrary-UFPBR (targets ) 



OOQNT TO ANAI/TDT ' 

USE 3ZMFDZ5XG - . ( ^ , Tt 
OOPY STRUOJURE TO 5EHPS0B 



APPEND PBCM TEMPDBSIG FOR lihrmy=UPPER (Ob j ectl ) 
IP tarfifafc2o'* • :* , , 

APPEND FROM ; TOMFDBSIG FOR. IibraryWPPER(0bject2) 



IF target3o' 

•APPEND PROM TEMPDBSIG FOR lihrary=UPPER (OhrJ ect3 ) 



OOUNT TO SUBTRACTOT 
SST TALK OFF 



4 COMPRESSION SDBR00T3NE A * 
? 1 COMPRESSED' OOSRY LIBRARY 1 
USE TdPLIB 
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i/jg^^^w^^*^ , 

fr/JW-l ^ ;*-?.- 1 v I .;: *£>.,,..-- - v ... 

: loop * r "r " : " n * - w ~ rav- •ri.'.v*:* . 

END2P V'^ 

EOP a 1 

6TORE STO3T_TO ^3ESTA ■ -i., ...... . , 

flTTOE BNVtQC TO. TES2S - 

STORB 0,TO;CSSIQB * ' - £; - "' -^> 5 ,-'-. u ;. rvxv^ ■ ; 

*,;VCBliBlE - ; ' ^"t/^-r fV^:.._ iu:, : - v.: / = 

" DUP * DUP+i f, - r -^ --.^ ^-^V ^ !^T,:v : .v:., ; r-y^^ r vT V -- -p^n^vW; * 
.r "OP iV _" ( . . . 

fi«fea 

.ENDDO.TEST ^'"[ ;' J c ^' 

-LOOP . ■ * : '. : '. TV'*' " . ': i; T' ; "' ' ' - - - , '-^ irrrv : c----%.v 

BKDDO ROLL tC "> " ■ ■ -.. r -^ v*: . ; ,-;v: t , - ; V""i ■;. 

SORT CN RPBqD/D,KOMBER TO TEKPiARSORT 
USE rogPTORgORT. 

iRggI« ftCg MJi SlftRT W1TO Rm©/IDSENB*10000 
COUNT TO/lSUPOftROO . 

^* CCMPlifeSMCM StaaOOTIMB B ••"•••••••••••♦•••.♦••••♦•i 

;? t COM? K B SS Iffi TARGET LIBRARY" 
TOE .TOM PSOB - ^ 

«ORT. C3N EKTTOONOMBER TD'SOBSORT 
USE SUHSORT 
OODH T TO 

REPLACE KJL RPSID WTIH 1 
MARK! e 1 
6H3-0 . ' 

DO WHILE 8W3sO ROLL 
IF HARK1 >a SOK3B4E 
PACK 

COUNT TO BUNIQUE 
SH2sl 
LOOP - 



GO MARKX • 
DOT - I 

STORE, ENIKT TO TEOTA 
STORE D TO DSSK3A 
EM » 0 

CO WHILE 'fitfeO TEST 
SKIP 

gTORB ENTRY TO. 2ESTB 
STORE D TO DESI6B 

If TESTA = TESTO.»TO.DSSIGA?DESXG8 
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'0UJ « :tX3P+i *v 

-LOOP- 1 -' ;,V 

••JWDlF'r • Oh: 
GO NARKl' r 
REPLACE RFZND WITH OOP 
MRHK1 » MARK1+DUP ay. 
Stela. 

LCXJP^ . '>-■ : 
SNDDO TEST / -r. 
JjOW : V(i ; yo ( 
HJDOO^HOtJCi - r< 1 - 

SORT CK RFSND/P , NUMBER TO TEKPSUBSOCT 
.'TTSErOTMPSUBSGRT £7 

*REPLWB ALL START SWISH : RFBKP/IPCBqEnOOOO 
COUNT TO TEMF$0BCO< ■ I ] < ^ ::■ V *. : ~ ; ; T . - 



♦FUSION ROUTINE 

? 'S CfiTRA CnKP'IJBRARIES 1 >,...;■:' r,y.\ , ' ; y.:~' ; : ■'• v - i *' 1 

OSE SCBIRACTICN K .'• *;> -i'iV . r\<. ; * . ' \ / 

COJPY STTOCTURB TOf CRUNCHER » V V:W.- , ,v." 

OSB TfejflWJUBSORT : ;,V - i ~~ r r- , v ■;' \ • , / / - * t!' ..'"<'■' ? ; - : 

SEta»^ y.. ( v. . J*;*.^ V^vW-^^-' , : 0 <^' 



AMD® PROM TaGTARSORT 
CCUNTvID BAILOOT f - ^ 

HARK « 0 * ■'! r .. r. c ■ %J . , 

DO/WHHfi 

HARK «.tMK+l 
IF- MARX>BAILOUT 



• GO MftR K 

STORE" ENIOT TO SCANNER 
55LBCT 2 

1XXATE. FOR INTRV^SCANNER 
EPTDDNDO 

STORE RfEHD TO BIT1 
STORE RpKD TO 8ZT2 ' 



STORE 1/2 TO . Bin 
STORE 0 TO BITS 



RBELACE BGFRBO' WX3H BIT2 
R2PIACE ACTUAL WITH BXT1 

WDCO 

3HFLACE ALL RATIO WITO RFEKD/ACXUAL 

7 'PO IRQ FPOL SORT 3? RATIO* 
.SORT ON, RATIO/D p B3FRBQ/D, DESCRIPTOR TO 
VSZ PINAL 

CO CASE. 

CASS PTPcO* ' 

SET DEVICE TO HUNT 

SFr_FR33TT ON r 

EJECT 

CASE PTPol 

SBT ALTERNATE TO • Adenoid -Patent Figures t Subtraction . txt ■ 
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SET ALTERNATE CN 



flfcfab.. Pisrotefg moo TO'.Piwrihs 



STUKg mmME - BTARTIKE.IO CQUFSBC 
^lukts GQMFSEC/60 TO COKBON 

Ji ''-' : ' ,{ °'-" i: -- 

SET- MARGIN 70 10 

SWT. ^Idhraiy.euteaaction Analysis- 6TVLE 65536 FONT ■Geneva', 274 COLOR 0,0,0,-1.-1 

77 I- 

*?? ms.M- . 

7 -'Clone xuntdberc • "~^ r 

??;BiR(iiirnATE P 5, 0) 

.77 / through. • ' 
7? CIS (TERMINATE, 6,0} 
7^Lihrarieat * 
7 Targretl 
IP Jflarget2o' 

77 ttorgeW - N "^f^ \ J Vr r - 

ENDffF -a- '.lx\4v*--.: 

lF-o&rget3<>': 1 
7? ' • r 
77 Targets - 4 ^ 



7 ^eubtractlnsr; 
7 0fc»jectl * r 

??\.' 
?70bject2 



IF Qbjeat3o' 

* - ■ 
77 Cbjcct3 
BflDX? . . 

7 1 Designations r ."• 

JP BwtchaO .AND. Jfcnatch=0 .AND. Cnatch=0' .AND. IMATCH=0 

• • AAA 

HNDIF ;,; 

XF finatch»l v 

aauur 

IP ttaatchsl - 
7? 'Human, ' 

HQS? 

*IP Qpftbdhal 
77 'Other ep. ? 



if &Kttch»l 

77 'IHCYTE' 
SESXC7 
;2P AH&Lrl 

7 'Sorted hy ABUNDANCE'- 

BNDZP. 

IP ANAL-3 

7 ' Arrattged function 1 



5 2 



WO 95/20681 



PCT/US95/01160 



7 'Total clones represented i 1 
?? STO<Wr,5,0> 
7 'Total • clones analyzed: 1 
?? SSR(START0T,5*0) 
? 'Total, computation., fciinet 
>77 Sni<CCM5MIN,5,2) ' 
?? 1 minutes' ' 
? * 

f ^?' i , d-'« ^ieaatlon ACf. «,:<iistribution z = location, r «= function a m epeciei i = lnte 

~y SCREE; l TYFB 0 BEADING 'Screen 1* AT 40,2 SIZE 286,453 PIXELS FONT 'Geneva 1 '^ COLOR 0 0 0 
' "DO CASS . • ' •' ' 

- . . • . CASE ANAbal 
I^r i ... ?2,STR(AONXQUM#0) 

^ v. Tv;.nrW ; ' genes, .for---a total- of .«.- • . .. 

*7 ' clones' " • lK " ' t: ~ ' 3n 

;/ ? . ^ • 

W £':-tPP52 IF? L 0 '™™? 'Screen 1- AT 4 0,2 S IZE 286,492 PIXELS FOOT 'Geneva-,7 COEWR 0,0,0, 
^ ?i«lda Jnsiiber,D,F,Z,R # E*TO^S,IIE^^ W ' U ' U ' 

;; .; \ SET PRINT OFF 

■;.".;„;; , CLOSE DATABASES , 

r.- «fiirauxGuy;FoxBAaE+/Mac:fax files* clones, dbf 

^ c^.an&w 

^ ^ • * 1 arr a nge/fancticn 
nivi* - -SEP PRINT CN 
p ^ SBT HEADER ON 

^.. .^BB^X^ePEr.ti HMXTC •Screen l'.AT 40,2 SIZE 286,492 PIXELS * FONT "Helvetica* ,268 COLOR 0 
T?. ■ . . BINDING PROTEINS' 

fCHEEN 1 WFE 0 HIDING 'Screen 1? "AT 40,2 SIZE 286,492 PIXELS FOOT 'Helvetica', 26$ COLOR 0 
. , 7 1 surface molecules end receptors i 1 . • • v 

fCHf^JL TOPE^O HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva',7 COLOR 0,0,0, 
Ufit <*F fields lJinnber,D, ; P;Z f R ? Em^ ptJR rJb' . * 

, ' f^j^J,^^J!!f? m V .'*T « «■ S^F"* ,^ : ;i«vatica- ,265 WLOR 0 

- HSft P* 3 'Screen 1' AT '40,2' SIZE 286V492 PIXELS FONT ■Geneva", 7 COLOR 0,0,0, 

list .OFF fields xmber f V f T f z^iWm tor r^o 1 ^^ 

- * fCTOEN'i OTPE^O HEADING 'Screen 1' AT 40,2 SIZE 286 , 492 PIXELS FOOT *Relvetica* > 265 COLOR 0 

■sy 7 Llg snoa 'and aff actors: !;- r ' : ^ .-)< .. .. }> . ... .. 

; SCRHEiri OTPS 0 HEADING •Screen i" AT 40, 2 SIZE 286,492 PIXELS FONT •Geneva-,? COLOR 0,0,6, 

■ list OFF) fi«l^cflunber t D A F,z,^ % c ' ' 

: SOiKEN 1 TYPE 0 HEADHS3 . • Screen 1 • AT 40,2- SIZE 286,492. iPIXELS PONT "Helvetica ■ #265 COLOR 0 

■ >' moth er binding, proteins r' ^* ^ v 

SCREEN 1 TOPE -0 HEEDING .'Screen 1* AT '40, 2 SIZE '286, 492 PIXELS FONT 'Geneva', 7 COLOR 0,0.0, 
. . ; ; list OFF, fields :nun^,D,F, 2, R,^ te'Z*^^ ' 

• • : " ' " . . . • -..•„,• 

; SCPSaJ 1 TrPE 0 HEADING "Scraen 1» AT 40/2 SIZE 286, 492 PIXELS' TONT.-Helvetica-^Se COLOR 0 

7 1 • ■ • ■ ONCOGENES'- u- ■ i 

SCRBSN 1 TYPE 0 HERDING ^Screen X",W 40 t 2 SIZE ( 286, 492 PIXELS FONT '^Helvetica', 265 COLOR 0 
? 'General oncogenes! 1 , „■ 

SCREEN 1 WFE 0. HEADING "fiezean !• AT .40,2 SIZE 286,492 PIXELS .FOOT , Geneva , *,7 COLOR 0,0.0. 
list OFF fields nutnber^D)P,Z,R,ENt l RV,S f roSCRIPTOR,BGFREQ,RFEND f RATIO,I FOR Rp'O' 

SCREEN 1 TYPE 0 HEADING 'Screen I* AT 40,2 SIZE 286,492 PIXELS P0NP •Helvetica', 2 65 COLOR 0 
7 'OTP-binding proteins i ' . 

SCREEN 1 WE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT -Geneva'*,? COLOR O^O.O. 
list OFF fields number, D f F, Z , R, ENTRY, S , DESCRIPTOR, BSPREQ # RFSND, RATIO, X FOR Rn'O' 
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SOIEEN 1 TYPE 0 READING ■Screen 1" AT 40 2 ctto ^ojt* .inn 

7 'Xinaees and Phofiphataeegf? OT 40,2 SIZE 286,492 PIXELS FOOT "Helvetica** 255 Otft&R 0 
SCREEN 1 'TYPE 0 HEADING "Screen 3° AT 40 2 £T7.r m« zo* 
: ...^ OFF fields nunber.O.P^^sf^^^^^^.;^:;^ 

i- B^PS^^'SS^ »' * «■ ««.4« PIXBLS . F** 'Helvetica' 1 , 265 COU* . 

, - ISCRESN 1 TYPE 0 HEADHSG ."Screen 1" AT 40 2 SIZE 2iu 409 B «« * 

J? : ';6CBSSlJ.'VSPB<fr; HEADING "Screen 1" AT 40 2 sir* Sfl* ja* 
t^^F ^ l-T;; f^^^mSm l^&LSISfef 32 PIXELS •Helvetica" f 268 COLOR 0 

* ;u} SCREEN 1 TYPE 0 BEADIN3 ' "Screen" 1' AT 40 2 ST5H 5P< ja* ^ * 
* *->-* : SCREEtf X.IYPSj.0 HEADING "Screen 1' ATl^2 erre o« 4*1 * 

^^SiSS^^ *~» l ' * «' 2 to* »M« m - m -Helvetica-^ cwi 0 

'= ' -^iiSyfflr? iSCre6B V "» ^ -Helv.tiea.,265 co«K 0 

SCREEN 1 TYPE 0 ,fiEADING "Screen 1* AT 40 2 stze 5nc 

* . u-t,^ ^ ^D^^^s&sssiSjss .jg&. xTsrrs." 1 * °' o; °' 

: SSSBXITXSB 0 HEADEJG -Screen !■ AT 40,2 SIZE aec 409 tr«™,„ 

. ,ust off ^.D,^ a , R ,^V^^fl^!ffi,Si 0 ^:2.^ 

^j"" 1 TOTE 0 HEADCB 'Screen !• AT ,0., SI Z E 286 , 493 . ^ .He^.,^.^ „ 



? 



r SCREEN 1 TYPE 0 HHDING -Screen i« at 40 2 sizs sse 

SCREEN 1 TYPE 0 HEADING -Screen 1- AT 40 2 stze m« -on v 

I«t OFF ti^ -^MAfc— ?i»S^ 0.0,0, 

■^^^Assssasrr at 40:2 sizs ™>»> «— ^ •H 6 iv.uc a ., 365 0 

SCREEN 1 TYPE 0 KSASEQA "Screen I 9 AT 40.2 STZB 9fi(! zon r,,-u L . L „ 

list off tuu. n«ib e r, D ,F. 2tR .^ e S^ c ^ B ^^ ^ , J?SR::J. W 0,0,0, 

SCREEN 1 TYPE 0 HEADD33 "Screen 1" A3P 40,2 SI2fi 29fi zo^ btvwi, 

lint OFF fie** au^r (D( l^^ s fL 2 c^f^fS.S^ °'°' 0 ' 

-fSEi LSS,:^ «' 3 SI2E ^ .Rel^ica.^K COLOR 0 

«^ ' «>"->» I- AT 40,2 SIZS w ^ ^ . Genfiva . (7 ^ Q . . 
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list cot fields nuno^DfF^R.a^fS.iMaaPT^ for Rh'M 1 

SCREEN 1 TXPE 0, READING "Screen 1- AT 40,2 SIZE 286,492 PIXELS FCNT •SlfflSiJt* 3h Sfiok 0 

? ^gac leic acid metabolism; *• . . ' vww* w . 

SCREaJ 2, TYPE 0*HEADIM3 •Screen '1" AT 40,2 SIZE 286,492 PIXELS PCNT 'Geneva", 7 C M/OR 0,0.0 ' 

list. OFF fields number,D,F,Z,R,ENTRtf,S,DESCRIFTOR,BaFl^,RF^ FOR R^N 1 ' 

VIUBWl OTFE 0 HEADttO "Screen 1" AT 40,2 SIZE 286,492 PIXELS' FOOT •Helvetica ",265 COLOR 0 
*i, " ? ' * Lipid metabolism! ' 

{.SOffiWl WSf^O HEADIWO "Screen 1- AT 40,2 SIZE 286 ,492 PIXELS FOOT "Geneva-,7 COLOR 0,0,0, 
v r^ist opp fields »Bnber,D,P,Z l R,EfcnW,S,DBSCRIPIOR,BCPKra VGR Rb'W 

f ' 

5 v BCHMK 1 TYPE 0 HEADING "Screen 1" AT 40 r 2 SIZE 286,492 PIXELS FONT "Helvetica ",2 65 COLOR 0 
\- 7 /Other enzymes j V 

>.-: SCREEN IrCTFB 6 HEADING "ScredtU^M 40,2 SIZE 286,492 PIXELS PONT "Geneva*, 7 COLOR 0.0 0 

L .f?^~}<?*?* P ^JE^MCIW; *e««« r X?! AT; ioV2 rfiCEffl ae>7492r!.PIXELS FONT "Helvetica" ,268 COLO* 0 

SCREO^l TOHS 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS ' FOOT "Helvetica" , 265 COLOR 0 
~*' '• ?. 'Str ess response i * 

i "*'?P , KLi WrtJTJBMHW 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva",? COLOR 0,0.0, 
tlist OFF fields xwn*ber,D,r,Z,R,EmY,S,DH£CRI^ tOR K= ! H' 

""' ?°SiJtSl:? * EMma " SCreea 1# ™»MM' PIXELS FOOT "Helveticd' ,265 COLOR '0 

fFKLi 5^ P ?.° HEM^ 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT "Geneva',7 COLOR 0,0,0, 
list OFF fields, nunter,D,F,Z,R,ara^,S,ffi^ r= ; k» 

' 1 7 F B 0 ? EftDDO ."Screen 1" M 40i2 SIZE 286,492 PIXBL9 FOOT ■Helvetica", 265 COLOR -0 

7 *otner clones i 

SCREEN! TYPE 0 .HEADING "Screen 1" 'AT 40,2 SIZE 286,492 PIXELS * FOOT" "Geneva" , 7. COLOR 0,0 0 
; list OFF.fields number, D,F,Z,R,ENTFY, 3, tBSCRIPTOR,BCFREQ,RFEND,RATIO,I FOR Ro'X 1 - ' 

. SCHEEW 1 TOPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT "Helvetica ",265 COLOR 0 
? 'Clones ■ of unknown function j • . . * 

SCREarl JYPE^O HEADING "Screen 1' AT 40,2 SIZE 286,492 PIXELS FOOT 'Geneva',7 COLOR 0,0,0, 
list OFF fields nunker,D,?,z,R,ENiOT,s,nESCRiPTOR,BG FOR r-»u« 

S3DCASE 

. DO 'Teat print .pro" 
• SET PRINT OFF 
.SET DEVICE TO SCREEN 
[ CLOSE DATABASES 

ERASE ; 2EHPLXB . DBF 
'ERASE 'TEBgNUMtPBF 

EpASE TEMPDESIG.DBF 

SET WARGIN TO 0 
'CLEAR 

LOOP 

ENDDO 
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EST T&LK QPF 
SET PRIW QFP* 
SET EXACT OFF 

h-«T»~J- STOKE 0 TO MtsaiS ' TO 1>c *3 ec t 

*" ru<r 0 '10 Zog 

V cc-vSTOE 1,30. Bail j; lf , 

r\<^- DOWHH* .T. 

'' i .feoffram-i Northern (aiaglekfet 
V f . it ;* t «^«« -Ponnat file Northern ( B ingi e ) 

Northern (single) ,fmt 1# 

/IPBail«2 
OBftR ; . .... . v 

. »crgen 1 off 

IF Eobjeoto' . . 

IjOCATE FOR LooktrEobiect 
LOOP 

ajar? 



CL0S2 CATABASES 

|RAS8 •■LocOcup entry, dbf • ' 

AST EXACT OFF r. - s • . 

g SW OFP 

psE "Lookup desariptor.dbf- 
Ct£AR 
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lcop 

5NDIF 

STORE Entry IX) Seardxval 

CLOSE DATABASES ■ 

BRASS "Lookup descriptor. dbf 1 

SET P CAQT ON 

ENDXF • 

* TP *Mj8idboO 

: X CSE "5n»utOuytPo^ASE+/Mac!Fox files: clones, dbf * 
-00 NUmb Or- i, 

* BHWoK ' ...viU./'v--:. .'•■':*.' 0 
STORE 'Entry TO Sesrcbval 

BV2XF 

■•o^casMi-.' . v.. 

? -'northern analysis for entry ' 
t , ?? Seaxchval 

. , ? •Biter. V to ^proceed':*,:/. 
WAIT TO OK * 
GUBAR 

XF, UPPER (OK) o 1 y 1 f 
screen 1 off 

KBnjW T 'V ' - ? • ' 

.ENDEP .4, 

M ' * CO^RI^SICW'SUBROtJimE FOR Lihra*y,dbf 
7 'Ccnpreasisg the Libraries file »ow;-..v 
USB ■ fipart <5iy;FoxBASa»/Mac:Pox files: libraries. dbf 

- SET SAFETY ogp 1 

SORT 'ON library TO •Compressed libraries. dbf ■ 

* FOR eate red>0 ' 
SET SAFETY ON 

CSS •Conjpireseed libraries*, dbf ?, 

DELETE FOR enter ed^'O 

gCT_ 

COQNT TO TOT 
MARK1 a 1 
SM3oO 

^ DO WHILE SW2=0 ROLL 
, "IF HAM0L >» TOT 
• PACK 

- LOOP - 
EKDIF 

60MARK1. • _= 

STOR E library TO TESTA 

' SKIP . 

Store Libr ary ro testb . . ■ 

IP IES TA s TESTS 

D^XEIE 

SNOT 

MARK1 *s WARK1+1 : 
LOOP ' 
Q3DD0 ROT1T1 

* Northern analysis 
CLEAR 

? 'Doing the northern now. . . 
SET TAUC W 

USE ' Siiart GuyiFox3ASE+/KactFox files t clones, dbf * 
SET SAFETY OFF 

OOPy TO "Hits. dbf", FOR «nt-jcyw»earchval 
SET SAFETY CN 
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• MASTER ANALYSIS 3; VERSION 12-9-94 

t'Uasgttybmui for^anaiyeia* output . ^ 

CIOSB -.pATAaASES " :v ^* -> *"-' •* ■•*."■ " c, s^i-.-v : . ^^sr.^v- , CV^Vi., Ti^Tr, :..cY^ 

SET .ZAUCopFP ^. f< „ . 

iSET^ SAFETY OFF " "* ' ** " 
CLEAR*. - 

SET DEVICE TO SCREEN 

S^?E£ TO 'SmartGuyrPoxfiASE+ZMac: fox filesiOut^ut progxnsi- 

GO JOT - v n ... . ....-p.....-.,,.. , >.-.-^ t . : .-. <j.^,.'-.. >~;_t> M^/'v, v : - .. f ,x>v^> 

fi^O^jflJMBER TO INITIATE . t . ■ 
GOJBOTTOM_^' "* .M«y 
STOR E NUMBER i T0 TERMINATE 
STORE O'TQ ENTIRE ' 
STORE; 0- TO CQNDEN" 
STORE 0 TO ANAL,, , ... 

STORE 0. .TO' EMAICH r *°* J ; """^ f , ..,..„:. v tl f r -f:^, 0 :j?-i t f&?o : - ! . V-'-fV >V>- , xOT* ... ~u 

STORE'QTO HMATCH 

STORE 0~ TO GMATCH " 

STORE 0 TO IMATCH 

STORE 0 TO XMATCK " 1 

STORE 0 TO PRINTON 

STORE O^.TO PTP 

injmLR .iv ; ' ^ 

* .Program. : tester analysis, fint 

* Date...,: 12/9/94 . 

* Version. r-PoXBASE^/Mac, .revision 1.10 
VNotefl r .>.;s Format file Master analysis 

f C Kri JS^S 0 " Screen L " AT 40 ' 2 286,492 PIXELS FONT 'Geneva- 9 COLOR OOO 

f £25* 2?' 255 TO 27 7,«D STYLE 28447 COLOR 0,ti;^l -25600?3~-l : - ' * 
! r 555S »- 1 78^« STYLE 3871 COLOR 0,o/-i/-a6w/-l,-l 

??< 98 "Customized Output Menu* STYLE 65536 KOT -Geneva- 274 COLOR 0 o 1 i 1 
6 PIXELS 45/54 GET conden STYLE 65536 FONT "Chicago-, 12 P^RE-I*? emA^^^lZr 1 ^ 1 

54,261 GET anal STYLE 65536 FOOT -cSclS',12 sSS?W SortSJ^ 
6 PIXELS 117,126 GEIVEMATCH STYLE 65536 FOOT -Chicago- 12 KOTJSE 5(1*5 ^T^iS^S^ 

I SSSS i 53 ' 126 ^ «536 FONT -Chicago- ,12 PICTURE 4*C OtSS^SS a L 1 II 

£ SSf !H 52 ^ "Matches; • STYLE 65536 FONT -(^ev^ 268 COLOR 0,0,°1 -1 -1 -1 ™ 1 
I SHP 3 S 3 ' 54 ^ PRINTON STYLE 65536 FOOT 'Chicago- , 12 PICTORE- @ *C iiDluk l[Jn* n.*w 

S'raS* 5S'Kf 25 J*? 6 * M 55536 «" 'Chiclgc%l2 S ?A^SS- S J?65» 
a K 2S'"S 021 i^tiate STYLE 0 FONT "Geneva',12 SIZE" 15/70 COLOR 0 0-1 -1 -1 -1 
9 PDms 270,146 GOT terminate STYLE 0 FOOT -Geneva-, 12 SIZE 15 70 oowft n k i i i i 
J .HSU 234,134 SAY -Include clones * STYLE 65536 FONT Sbw^U^SSr 6 0 C i 

5 •->- STYLE 65536 FOOT - Geneva -,iTc<^To -1 5 -1 °i 0 '' 1 ' ^ 

0 PIXELS 198,126 GST FW STYLE 65536 FONT -Chicago- 12 PICTORE °i*c Prinh ^ *<i- CTO i, ft 
S-'2SS5 i2 9 ' 0 TO 2S7,120 STYLE 3871 COLOR M?-l?-25600;^ C t0 file SIZE 15 ' 9 

1 f5^f 522 # ?«^L BLibrary "Action'. STYLE 65536 PONT 'Geneva -,266 COLC« 0.0 -1 -1 -1 -1 

6 PIXELS 227,18 GET ENTIRE STYLE 65536 FONT -Chicago',12 PIC^E ^..telectk-'sizE 16 

* EOF: Master analysis. fmt 
READ • 

IF ANAL»9 

CLEAR 

CLOSE DATABASES 
ERASE TEMFMASTER.'DBF 

USE "SmartGuy:FpxBASE+/Mac:fo>c files i clones. dbf- 
SET SA FETY ON 
SCP^MM i OFF 

RETURN 

ENDIF 
clear 
? INITIATE 

7 TERMINATE 
7 .CGNDEN 

? ANAL 
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* Hmatch 1 ,^ 

Retort* -V- -a^ri^, v.- 
?^twA < rcs . •■■ s ■ v -'*\. 

SEP 'TALK ON c " 

IP ENTIRES '*'*•'"■'' •Vjv-.I gM^OO- ^ 'u/a^ 

U3B •Unique libraries :dbf ■ 

t' REPLACE ALL' i WIW : -* " 4 • : - - Vl ' r ' ' . 

F ^^^7IEl^S i , libname , library , total , entered AT 0.6 

USE B SmartGiy:PoxBASE+/Mac:fox files t clones. dbf" 
*U32 TEMPNUM 

COW STRUCTURE TO TEKPLIB 
USE TEMPLIB 

; ip arciRE-i ■"* 

c ;: AFPSH) FROM ; 'Smart Guy : Fo*BASS+/Mac ; fox files : Clones . db*' 

' IP EOTIREa2 >■ ^v^:. 

USB "Unique libraries, dbf 

cop^ to ssu:^^ 

r STORE RSCCOUNTi) TO STOPIT 

•V DO WHILE ,T. . 

TP MARIOSTOPIT 
V", CLEAR 

• exit, • •• - 

\ ..SOI? ' 1 - - 

1 USE; SELECTED 
GO MARK 

. STORE library TO THISCNE 
? 'COPYING ' 
- ■?? TrilSONE 

" USE TEMPIiIBT ^- 'ntV' 

SffiiSl 'tomw^ * Po3iBASE+ ^ : £ox ' f iles 5 Clones /dbf ■ FOR library-TOISONE 
LOO? " '' r ' 
SNDDO 
ENDIF 

USE "SmarcGuy :PoxBASE+-/Mac : fox files rclones.dbf 
CCUWT TO STARTOT4 . . . 

COPS' STRUCTURE TO TEMPDESIG ' >: 

USE TEMPDESIG 

IP EtaatchoO .AND.'. HmatehsO .AND. CsratchoO .AND. IMATCH.O 

APPEND FROM TEMFLIB 

ENDIF 

IF Emacchsi 

APPEND FROM TEMPLIB FOR D='E' 
ENDIF 

IF Hmatchol 

APPEND PROM TEMPLI3 FOR Da'H 1 
ENDIF 

IP Onatchal 

APPEND PROM TEMPLIB FOR D»'0» 
SNDIF 

IF Imatchol 

APPEND FROM TEMPLIB FOR D*'!' .OR.Dc 'X' .OR.D* 'N» 



IF 3&natchol * ' 

APPEND FROM TEMPLIB FOR Da'X' 

ENDIF 
COUNT TO ANALTOT 
set talk off 

DO CASE 
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aBTvDSWicS^ro PRINT * ^ ■ , , t.wc.-.'. r - c : , , r 

EJECT 

CftSE WFel f ^.-.K 



S^S^ 8 TO "^al function aort.txfc- 
S''£S£££ 12 ,; *■« Stlw 2:PuMtSTso^ txt- 



. ,Data ^" ^^inalysis- STOLE 65536 PONT -Garttva-,274 COLOR 0,0,0,-1,-!,-! 

i > : •: ^x^vjt. m.-;^-*' ^c^j^. • 

? oateO , . . 

7 * 1 Clone- numbers ' 
r? ; '9TR(IMlTlA!rs' l 6,0) 

1 through .« 
?? STR(TE8MINAES, 6, 0} £ 
? 'libraries j v 1 

? 'All libraries 1 ' " ' " ~ ? ' ' - ' ' '• v - ' . t. l^vc^; - . . v,;^ - . ; \v^ 

ENDIF • - . 

. MARiUl - 
; DO WHILE ,T. 
S-IF MARK>STOPIT 

_-v.*- -Kir ,.-..»-. = . . 

v-:: USE SELECTED " - 

GO mark 

1? raiM(libname) 
STOES MftRK+l TO MARK 
LOOP -.^ . .... 

- EMDOO ; ' 11 ' 

£NDIF , 

? 'i>esififnacic«9s ' 

IP ^atch»0 •AND. :Hmatch=0 .AND. Cttatch=0 .AND. 2MATCH-0 

IF Etaatehssl 
?? »a^et # * 

ENQIF 

IF Kaiatchtii ' ~ 

?? 'Human/ 1 

ENDIF ' * 

IF Ctoatchal 

?^ 'Other sp. 1 

ENDIF 

IF Imetch=l 
7? 'BJCraE* 
ENDIF 

IF Xroatch=l 
'EST 1 
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,? ^Condensed format .analysis 1 

r.SNDXF 1 l * • l ' - 

n*FvANAL»=2 

ry? 1 sorted fcy ENTRY '* 

•ajnrp. - . 

. IF ANALa3 '.\ v . - 

7 ?>Joranaiafl ^ ,5 MUND»^ r v -'-.-'VV «-■':. vv.vv.^. -w- o^- i.-u^ o-v;--. ..x.^-r 

K^ff *TP *** 

^IF-ANAL-V ' fc l '" ' > ; ' 
? 'Sorted .by INTEREST' 
HJDIF 

1? ANAU5 , , . 

>?, 'Arranged ^'HBttraW',. v - . 

IP,ANALo6 . . 
j.,? 'Arranged by DISTRIBUTION' 
*v£NDXF 

IP ANAL=»7 
- ?. 'Arranged by. FUNCTION 1 

ENDIF ; - .v'^v . . , ■ - 

■•7^'Tbtal clones represented:"'"' " '* ' ' ,v ' >: ^ r ' v '' ' ' ;; 

?? STR(STARTOT,6,0) . 
.s? 'Total clones analyzed: ' 
. PP.STRtANAWOT.e.Oy 

^? 

7 '1,= library . , d = designation £ « distribution z = location r = function c = cer 

ft******************************** *«*4******w^^ 

USE TEMPQESIG 

SCREEN 1 TYPE 6 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FOOT -Geneva'* 7 COLOR 0,0,0, 
DO CASE 
CASS ANAL=1 

* eort /number 

SET HEADING. ; -v.... .. , 

IF CONDGNal * 
SWT TO TEMPI CN ENTRY, NUMBER 
DO ■COMPRESSION number. PRG" 

'ELSE' ■ 

SORT TO TEMPI CN NUM32R 
USE TEMPI 

liet off fields number, L,D, F,Z,R,C, afTRYjS, DESCRIPTOR 
Mist off fields niiinber,L,D,F,2,R,C,E^Y,S,IE^ 
CLOSE DATABASES 
ERASE TEMPI. D3F ' 
WOOF 

CASE ANAL=2 

* eort/DESCRIPTOR 
SOT HEADING ON 

♦SORT TO TEMPI CN DESCRIPTOR, ENTRY, NUMBER/ S for Da'E' .OR.^'K' .OR.Da'O 1 ,QR.D*'X' .QR.Da'I 1 
•SORT TO TEMPI CN ENTRY, DESCRIPTOR, NUMBER /S for Da'E' .CR.Db'H 1 .OR.Da'O* .OR.De'X' .GR.Dfl'I' 
SORT TO TEMPI ON ENTRY, START/ S for D= 'E' .OR.Ds'K 1 •dR.D='0' .OR.D=*X' .OR.Da'I 1 
IF CCNUtNal 

DO "COMPRESSION entry. FRO* 
USE TEMPI 

list off fields number, L, D, F # Z,R,C,STCRY # S, DESCRIPTOR, LENGTH, RFQTO, INIT, I 
CrnSfi DATABASES 
E RASE TEMPI. DBF 
ENDIF 
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*?aort fcy abundance 



CASE ANAL- 4 

V sort /inter est. a. w *. 
SEfl;, iHBADIMS ON ' * 
IF CQNDEN=1 

gORTt-TO/.TOWPl ON ENTRY, NUMBER FOR I>0 
DO^CGMPRESSICN interest. PPG" 

VFi£& 



SORT ON vI/D/©JIHY TO TEMPI FOR I>1 

ERASE vTEMPl > DBF ^ ^ " 
ENDIF~> v "." v; - * " 

CASE -ANNUS ^ x ,., . 

^arrange/location "* - iJ '.->. vo* ■■'(?•: -..-vv- ■•..;„: -To':. v ( ' ■;■ 
SET HEADING ON 

STORE 4 TO AMPLIFIER . . - v v rV; . f ^ iV 

? 'Nuclear *• ' ' -v'vH-- -v ; i^-v^M^/.- 

DO "Conpression location-lprg» ' ;Xyr"~ '' v ". /. l ;t.'"'"'"' :: ' 



DO 'Normal subroutine !• 
E5DIF. . 

? '^oplasmic: ■ . . . 

DO "Cenpression locatioa.prg" 

ELSE.- 

DO "Normal subroutine 1" 
EKDU? ... r V; . 
? •Cycbskelecon: 1 

g^^^NOMBER FIELDS RFaro^KUMBER.L.C.F^.R.C^^.S.X^PKR^ra, W».I,C«mi 
DO "Compression: : location.org" 



DOfNormal subroutine 1" 
EKDIF 

? •Cell surface: 1 

DO •Conpression location. prg" 
ELSE 

DO*Nornal subroutine 1* 
ENDZF 

? 'Intracellular menteane: 1 

DO *Conpression location.prg" 
ELSE 

DO "Normal subroutine 1° 
END1P 

? 'Mitochondrial: • 

^TW e ^ / NUMBER FIELDS RFEND,KUMBER # L, D,F,2, R,CrEWI5RY, S, DESCRIPTOR, LQJGTH, INIT, I,CCMMEN 
DO 'Ccnpression location.prg- 
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^6mj^jmFX t NUMBER FIELDS RF^, NUMBER, L,D,F,Z,R,C, STORY, S, DESCRIPTOR, 

$F,0GNDEN»1.., .„ { 

:W:"caofcf(&ssi6a location. prg" 

l SUSE ' ' 

;pp; J| *Noijnal subroutine 1" 
ENDIF ' " >! ' ' ' ■' * l,Ji * 

SpJPT CN ENTRY, NUMBER FIELDS RF£ND,NUMBER,L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CCMMEN 

D6 Va Carrpreaflioh locatiori.prg - 

ELSE." 

DO'* "Normal subroutine 1* 
^/Unknown i 1 

SORT ON ENTRY, NUMBER FIELDS RFaiD, NUMBER, L,D,F, Z, R,C,EWERY,S, DESCRIPTOR, LENGTH, INIT, I, CCXMa? 

tip' coNna^i • 

; DO ^Compression,, location .prg" 



;DD' "Normal subroutine 1" 

'ENDIF.!"...:V* '■''v., !.]';';- ' 

IF. CCNDENsi 

SET DEVICE, TO PRINTER 

SET. PRINTER GEN 

EJECT 

KV "Output heading.prg' 
USE •Analysis location. dbf f 
DO 'Create ^bargraph.prg" 
\$ET -HEADING OFF 

?? 1 FUNCTIONAL CLASS TOTAL UNIQUE NEW % TOTAL 1 

LIST OFF FIELDS Z,NAME, CLONES, GENES, NEW, PERCENT, GRAPH 
CLOSE DATABASES 
ERASE TEKP2.DBF 
SET HEADING ON 

*USE "StaartGuyiFoxBAS3+/Kac:fox files iTEMEMASTER.dbf • 
SNDIF v . / V ; - * • ■ ■ • 

CASE ANAL=6 V""v : . v, ;7 

* arrange/distributioiti' ! 

SET HEADING ON 

STORE .3 TO AMPLIFIER 

? 'Cell/ tissue specific distribution i 1 

SORT CN BTTRY, NUMBER FIELDS RFEND,NUMHER,L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, INITfl/CCEMMEN 
IF CCNDSNsl 

DO "Ccnpression dlstrib.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Non-specific distribution: ' 

SORT ON QsTRY, NUMBER FIELDS RFEND, NUMBER, L,D,F, Z, R, C, ENTRY, 3, DESCRIPTOR, LENGTH/ INIT, I,CQMMEN 
IF CCNUafcd: 

pO "Caopression distrib.prg" 



DO "Normal 1 subroutine 1" 
ENDIF 

? 'Unknown distribution: • 

SORT CN ENTRY, NUMBER FIELDS RFEND, NUMBER, L,D,F,Z,R,C ( ENIRY,S, DESCRIPTOR, U35Gra,INIT,I,OOMMEH 
IF CCNDENal 

DO " Compression distrib.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

IF CCNDENel 

SET DEVICE TO PRINTER 

SET PRINTER ON 
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DO "Output heading.prg' 
tfffi.'Analysie distributions • 
DOp J Create> bargraph.prg" 
SOT HEADIN3 OFF 

l':'\vr , , TU ^ 0N ^ CUSS TOTAL UNIQUE * TOTAL' 

LIST OP? :PIELD9^NAME l CLQNES,t3ffi«ES f PERCENT, GRAPH 
CLOSE DATABASES 
ERASE ,TEMP2vDBP 
S2T t HEADING ON 

^^ F *SmartGuy:PoxBASE+/Mic:£ox files :rEEMPMASTSR.dbf * 

CASE ANAL=7 
*rarxange/function 
SET HEADING ON 
STORE: 10 TO AMPLIFIER 

J*-. . . BINDING PROTEINS' 

? 'Surface molecule* and receptors: ' 

gCT^TOflOMBER FIELDS »«BUfcD.F.i.lUC.Bim.f.I»CBm^ 

DO- - fOoqpression funetion.prg" 

ELSE .... 

DO 'Normal subroutine 1" 



Calcium-binding proteins: ' 

SecS^'" 10 " FIELDS ^• ! ^' L -^R.c.nmtt,s,DK^^ 

DO ^Compression function .prg" 
ELSE 

DO 'Normal subroutine 1" 

? 'Ligands and effectors i 1 

^CcS^^^™ FIELDS ^NUMBER, 

DO •Compression funetion.prg- 
ELSE • - 

DO •Normal subroutine l m 
ENDIF ; ; 

? 'Other binding proteins* • 
DO •Compression funetion.prg" 



DO •Normal subroutine 1* t 

EHDXF f 

•EJECT 

? * : ONCOGENES' 
? 'General oncogenes i 1 

DO "Conpression funetion.prg" 
ELSE 

DO ■Normal subroutine 1* 



7 'GTP-binding proteins i • 

DO ^Conpression funetion.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Viral elements i 1 
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SORXJ^Ei^ RFEtt),NUKRER # L,D,F,Z,R,C,ENIW,S,nES^ 

DO ^CUl ^ BBBlon' ^Uncl^^pt^ ' ■ ^ • * . V . , I - . C . ^ ; ; ^ ; nw^'rr-iv-n j,^- iWj.. 
ELSE 

DO "Normal subroutine 1* 
ENDIF . 

? Vki^es^a^Stosphatases: • 

DO ;rc*apre9a^.1^ 

DO "Normal subroutine 1 B 
ENDIF 

? 'Tumor-related antigens i ' 

^LSL??^' NUMBER FIELDS RFSND,NU*BER,L,D,F,Z,R,C, ENTRY, S,DESCRI^ 
IF COxvDENal 

DO "Compression function. prg' 
ELSE 

DO "Normal subroutine !• 

SNDIF 

♦EJECT 

J ' PROTEIN SYNTHETIC MACHINERY PROTEINS' 

7 1 Transcription and Nucleic Acid-binding proteins: 1 

^ R ^^ e ^ N P ,Yf NUMBEH FISLDS **FEND,NUME3R,L,D,F, Z,R, C, ^JTRY, S, DESCRIPTOR, LENGTH, INIT, I,CCJH^ 

DO "Compression function. prg" 
ELSE 

DO 'Normal subroutine 1 B 

ENDIF 

? • Translation! ' 

S R LSLS N P Y ' NDMBER PIELDS RF2KD,NtMER,L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENSTC, INIT, I, CCMffiN 
IF CQNDIN-1 

DO "Compression function. pro" 
DO "Normal subroutine 1' 

ENDIF 

? 'Ribosomal proteins: 1 

fS^-.^u?^^ ' KtJMB5R FimnS ^F^D/ 1 ^^^^, D,F, Z,R,C,EOTRY, S, DESCRIPTOR, LOJGTK, INIT, I,CCMMEN 
IF CuNDENal 

DO "Compression f unction. prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Protein processing! ' 

^RTWJMTRY, NUMBER FIELDS RFEtfDiNUMBERjL/D.F, Z,R,C, ENIttY,S, DESCRIPTOR, L2tt*TH, INIT, I, COWMEN 
IF CGNDENsl 

DO "Compression function. prg". 



DO "Nonnal subroutine 1" 

EN DIF 

♦EJECT 

? ' ENZYMES 1 
? 

? 'Ferroproteinsi ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C , ENTRY, S , DESCRIPTOR, LENGTH , INIT. I,COM4£N 
IF CQNDENsl 

DO "Ccnpression function. prg" 

DO "Normal subroutine 1* 
ENDIF 

? ' Proteases and inhibitors : • 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D/F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, DOT, I/CCMMEN 
IF OCNDENal 

DO "Compression function.prg" 

RTr ft? ! 
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; 5?; "Conpreaaion function. pro" 
DO 'Normal subroutine 1» 
?:;'Sugar TOtabolismi 1 

S%SS? Cr '" ?,BIR ™" ^'^'^'^^C.^^DESC^^ 

Dp rCcmipression function .prg* 

-EL32 f 

~D0 ^Normal subroutine !• 
, EOZF 

J 'Amino acid metabolism: • 

W^^'^^ P1ELDS ^' I ^' L ' D ' F ' z ' R '='^s,DEsciap ro R,i^ < i HIT , I , CQM ^ 

"DO ■C6aj>ression /function, prg - 
ELSE * - 

DO "Normal subroutine !• 
ENDX? ■ 

?' 'Nucleic acid metabolisms • 
;D0 ■Compression function, prg- 



;D0 1 Normal subroutine 1' 
- ENDIF 

. ? 'Lipid metabolism: ■ 

;do 'Conqpreseion function .prg" 



, DO "Normal subroutine 1" 
• ENDTF 

? 'Other enzymes i * 

DO 'Compression function .prg" 



DO ■Normal subroutine 1" 



♦EJECT 

P ' MISCELLANEOUS CATEGORIES ' 

: ? 'Stress' response: ' 

DO •Compression functioh.prg" 
ELSE 

DO •Normal subroutine !■ 
ENDI? 

? 'Structural? ' 

DO 'Con^xession function .prg" 
ELSE 

DO 'Normal subroutine 1' 
END1F 

? 'Other clones! ' 

£? "tta^ession function. prg" 
ELSE 
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DO * Normal subrout ins rl "■ •• j * '■^V'.v ijt 

Clones o£ unknown functions' 
^^^^m,mmsR^visxiiS RFEOT,NUM3i*,L.D.F,z,R,c,2t^^ 

DO ' J? Conprefl Bion function .pry* 

DO Wanaal > subroutine 1" 

ENDIF: ; t 

IF C0NDEN»t . j, . ■ . 

EJECT 

♦SOT DEVICE TO PRINTER 

*SET, HUNT GN 

DO 'Output heading. pxg* 

*** 

USE t »toaly8is (function, dbf ■ 
DO "Create bargraph.prg" 
SET. HEADUvG OFF ; 
*+* 

SCRESK 1 TSSB.O HEADING "Screen 1* AT 40,2 SIZE 2*6,492 PIXELS PONT "GenBvaM2 COLOR 0,0,0 

■5 i " . * . 

* * ' TOTAL TOTAL NEW DIST 

- ( C ' FUNCTIONAL CLASS CLONES GENES GENES FUNCTIONAL CLASS' 

*** 

*LIST OH 7 FIELDS P,NAME, CLONES, GENES, NEW, PERCENT, GRAPH, COMPANY 
LIST OFF FIELDS, P#NAME, CLONES, GENES, NEW, PERCENT, GRAPH 
CLOSE DATABASES - 
ERASE TEMP2.D8F 
SET HEADING GN 

^SE *SrcaxtGuy:PaxBASE+/Maojfox files iTEMPMASTER.dbf ■ 
ENDIP 

CASE ANAL=8 

DO : n Subgroup surtraary a.prg" 
ENDCASE 

DO "Test print. pry" 
SET PRINT OFF . K 
SET DEVICE TO r SCRBEN 
CLOSE DATABASES \ ; v > 
•ERASE TEWPLIB. DBF 
•ERASE TEMPNUM.DBF 
* ERASE TEMPDESIG . DBF ' 

* ERASE SELECTED. DBF 
CLEAR h.v; ; c ... 
LOOP • •,' 

ENDDO 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USE raMPl : ' t% - • -.; 

COUN T 30 TOT 

REPLACE ALL EFEND WITH 1 

MAKKX = I*"-" ' ' !r • 

SW2uO 

DO*,WHXLE SW2=0 ROLL 

' it mark!' ^ror- 

PACK' \ J •■ 
COUNT TO UNIQUE 

COUNT TO N33WGENE3 FOR D= *H' .OR.D= '0' 

SW2»1 

LOOP 
., BSSDTF 
GO MARK1 
POP <s 1 . 

.STORE HTOV TO TESTA 

sw ■ o . - 

QO W HILE SW=0 TEST 
'SKIP . 

STORE EHERY TO TESTB 
: IF .TESTA = 1ES1B 

CUP = DUPrl 
LOOP 

sjdif, 

!G0 MARK1. 

REPLACE ;rfend wira DOP 
MARK1 « KARK1+DUP 
-SW=1 
LOOP 

MDDO TEST 
LOOP 

ENDDO ROLL 
♦GO TOP 

STORE Z TO IOC ' 

USE !Jtoalyais location, dbf 

LOCATE FOR Z*LOC 

.REPLACE CLONES WITH TOT 

REPLACE GENES WITH UWIQOE 

REPLACE NEW WITH NEWGENES 

USE 

'SORT GN RFEND'/D TO TEMP2 

USE TEMP2 

?? STR( UNIQUE, 5,0) 

?? ' genes, for a total of 1 

?? Sra<TOT,5,0) 

?? 1 .clones 1 

J' V Coincidence' 

list off fields number,Rmn),L,D,F,Z,R,C,^TOY ( S,iesc^^ 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI .DBF 
BRASS T5MP2.DBF 
USB TS-IPDESIG 
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* CqOTK^IOT'SDBRCOTI^ 

USE^T£MP1 K ... 

0OOTP'.^b , TOT/ . 

REPLACE ALL 1 SPEND' WITH 1 

SW2«0 v 

DO WHILE SW2=0 ROLL r ' 
I? J&ARKI >= T0T' ; 
PACK"; 

COOTT TO UNIQUE' 
SW2«1 
L06P 
ENDIF 
GO MARK1 ' 

STORE ENTRY TO TESTA 
SW . 0 

DO WHILE SW= 0 TEST 
SKI? . 

STORE ETWHY TO 1 TESTB- 

IF TES TA s TESTS* 

DELETE 

TOP « DUP+1 

LOOP 
' ENDIF 
GO MMOd? 

REPLACE ' UPEND WITH DOT 
MARK1- « MAKC1+D0? 

LOOP . 
ENDDO TOST 
LObP 

ENDDO ROLL 

»BROW SE 

♦SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TEMP2 :;; " < 

?? STR(OTIQ0E;4,0) 

77 • genes, for a total of 

77 STR(TOT,4,0) 

77 •' clones' **■ s • 

7 

7 V V Coincidence 1 

COUNT TO P4 FOR 1-4 

IF P4>0 

7 STR(P4,3,0) 

77 ' genes with priority a 4 (Secondary analysis:) 1 

list off fields number,RFEND,L,D,F,Z,R,C,ENIRY,S,^ for 2«4 

? 

ENDIF 

COUNT TO F3 FOR 1.3 

IF P3>0 

? STR(P3,3,0) 

?? 1 genes with priority « 3 (Full insert sequence : ) 1 

list off fields number , RFEND, Li D# F < Z * R, c , ENTRY, s , DESCRIPTOR/ L5KGTH/ INIT for I«3 
ENDIF 

COUOT TO P2 FOR 1=2. 

IF P2>0 

? STR(P2,3 ( 0) 

77 ' genes with priority - 2 (Primary analysis complete:) 1 

list off fields nU2nber,RFDTO, L, D,F,Z,R,C,ENIRY, S, DESCRIPTOR, LE2J3TH, INIT for 1*2 
7 

ENDIF 

COUNT TO Pi FOR 1=1 
IF P1>0 
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? STR(P1.3,0J 



^^f^^^'^^ tor i,i 

ni-f.^ - - yy~ s> . - 'V ' *• — " v • . Oft. t'- \». .a-.i^-P 

TSET' PRIOT OFF' s fv..r>--r . f^..j> \c , ), t» ^ , 

CLOSE DATABASES " w 
ERASE ,1£MP1 'DBF . . 

t^E * SmarcGuy iFoxBASE+VMac i fox files iclones.dbf 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 

MARKl el 

K) : MHH£ SW2oO.ROLL 
..-IP MARK! >= TOT "• 
..PACK 

] ; COUNT TO UNIQUE . 

MOP'' * " 
GO MARK1 

STORE ENTRY TO TESTA 

Cbt WHILE SW=0 TEST 
SKXPr ^v;y r;< 
STORE ENTRY TO TESTS 

. » IF! .TESTA . s TESTE 

..DELETE 

DUP m EUP+1 t v . 
LOOP, . ; : ^r^.. - 
HHDXFo, 
GO MARKl , 

REPLACE SPEND WITH OTP 
MARKl; e MARXl-fEUP 

SWsl"*: 

MOP " • r ,£ vu - • 

ENDDO TEST , 

LOOP 

ENDDO ROLL 

*BROWSE ■ - 

*SET„ PRINTER ON 

SORT ON NUMBER TO TEMP2 

USE TEKP2 

?? STR(Ut7IQU3 # 4 # 0) 

?? 1 genes, for a total of 1 
?? Sra(TOT # 5,0) - 
??; clones ' . 

I* « J, V C oincidence 1 

list off fields number / RFEND, L , D# F, Z t K, C, ENISY, S, DESCRIPTOR, La&TH, INIT, I 

♦SEP PRINT OFF 
CLOSE DATABASES 

ERASE TEMPI .DBF 

ERASE TEMP2. DBF \\ .. 

USE a SoiartOuyjFoxBASB+/>teo:fa>c files: clones. dh£ B 
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^COHPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

EgE^OEMFI/- -! 

REPEiJ4ffi"Wii' RFBND WITH 1 

MARia,:W;il..-v 

SW2=0 

•^3P W^TO^v^'^.y.^ ^!,;v^ I r.;;,„ r . r -;' .- 

f^CCOOT TO UNIQUE 

I^CCOOT'TO-NEWGENES FOR Ds'H 1 .0R.D=»0' 

SW2«1 

tOOP 

ENDIF 
GO MARKL 
DUP - 1 

STORE ENTRY TO TESTA 
SW o 0 

DOWHILE SW=0 TEST 
SKIP 

store anra to tests 

IF TESTA = 3ESTB 
DELETE 
DUP = DCJP+1 
LOOP 
?NDIF 
GO MARKT 

REPLACE RFEND WITO COP 
MARK1 « KARK1+DUP 
SW=I 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function, dbf 
LOCATE FOR P=FUNC 
'REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEW WITH NEWGENES. 
USE TEMPI 

SORT ON REEND/D TO TEMP2 

USE TEMP2 

SET HEADING CN 

?? STR (UNIQUE/ 5,0) 

?? 1 genes, far a total of 1 

?? STR (TOT, 5* 0) 

?? ' clones' 
*** 

* ' . V Coincidence' 

list off fields r*flriber,RFEl©,L,D,F,Z,R,c, 

♦SKff fH?da° Sg;,2S5& AT 40 ' 2 SI2B 286 ' 4M "« ««* 0.0. 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEKP1.DHF 
ERASE TE*?2.DBF 
USE TSMPDSSIG 
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COMPRESSION SUBROUTINE -FOR ANALYSIS PROGRAMS 
USB TEMPI, <, 
COUNT. TO TOT. 
REPLACE ALL RFEND WITH 1 

DO V2H1LE^ SW2=0 ROLL 
VrIF MARBO. >» TOT. ,,,^,1 
t.PACKf "<> <' ...v.v 
..rCCTJNT .TO UNIQUE 
- 6g£l. , :^,. a ,, . i;u ; 

* • ' • 11 A.. 



DUPn'l 

STORE- EMOTTiTDt .TESTA 
D0:MHH£ SWsO^TEST i.* q'v -r 

skip T ; r • < * 

STOR E . ENTR Y , TO 1 TSSTB 
IF TESTft n .TBSTB 



DELETE 

POP a EOP+l;; v : c . 
LOOP; . V*>^ ' -\ 



^■'':'**.:: 'OS, '■«■:'" 'C:;r: ...V V.. V-V ' 



GO'MARKl 

REPLACE RPEND^WITK CUP " V- -aVr • ^i-.^V 
MARK1 1 « MARKl+DUP . ^ >. 

SW=1 
LOOP 

SO TEST 
LOOP 

ENDDO ROLL 

GQ TO P '"/.■.•- -" /. t . 

STORE. P TO DIST 

USE ^Analysis distribution, dbf ,^ : 
LOCATE FOR PaDlST 

REPLACE CLONES WITH TOT : ,.-v«i ^? 

REPLACE OENES^Wira UNIQUE 
USE TEMPI.. 

e art on rfend/d to TEMP2 
USE TEMP2 

?? STR(UNIQUE/5,0) .-■ ... c 

?? 1 genes, for a total of 1 

?? 8TR(TOT f 5 # 0) , 

?? 1 clones' 

? -■■ V Coincidence' 

list off fields n\Mbex t *FWD t h,Q r f,Z,K t C t mrrKX t S ( DESCRIPTOR, LENGTH, INIT, I 

*SET PRINT OFF 
CLOSE DATABASES 
BRASS TEMPI. DBF 
.ERASE TEMP2.DBF 
USE TEMPDESIG 
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COUNT TO TOT 

> REPLACE ALL RFEND WITO 1 
?UKK1 - 1 
SW2*0r 

"DO WHILE ' SW2=0 ROLL 
• ; IF-HARK1 >- TOT 
■ PACK' - ^i-.-j-cx 
- ' COTOT\TO*itlNI0UE ! ? 

? i LOOP iv."^!.. f . - 
jGOfJMKl:.-.- 

iSJOHE ENTRY TO TESTA 

DO WHILE iSWsO TEST 
SKIP , ; ,v-;v. - jo 
STOPS ENTRY TO TESTS 

IF TESTA b TESTE 

WW 

otp ,= rop+i ■-, . 

LOW - .-, 
B©2F 
GO HARKl . 

REPLACE -REEND JOSH, CUP - 
MARK1 s MARK1+DUP 

sw=i;. 

LOOP 

SNDDO TEST 

LOOP . _ . , 

EMDDO : ROIi ' 

GO TO? 

USE TEMPI 

?? STR (UNIQUE, 5,0) 

?? ' genes « for a total of • 

?? STR (TOT, 5/0) 

?? ' clones' 

?.* V Coincidence 1 

last Off fields number, RFEND, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LEMGra, INIT, I 
*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMPDESIG 
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COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2*0 

DO WHILE SW2aO ROLL 
IF MARKi >= TOT 
PACK 

COUNT TO UNIQUE 
SW2=1 
LOO? 
ENDIF 
GO MAKX1 

CUP B 1 

STORE ENTRY TO TESTA 
SW b 0 

tO WHILE SWbO TEST 
SKIP 

STORE ENTRY TO TESTB 

IP l feSTA » TESTB 

DELETE" 

OTP b OTP+1 

LOOP 

INDIF 
GO MARK1 

REPLACE RFEND WITH OTP 
MARKl a MARXl+DUP 
Sfel 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
* BROWSE 

♦SET PRINTER ON 

SORT ON RFEND/ D, NUMBER TO TEMP2 
USE TEMP2 

REPLACE ALL START WITH RFEND/IDGENE*10C00 

?? STR {UNIQUE, 5,0) 

?? 1 geaes, for a total of ' 

?? STR(T0T,5,D) 

?? * clones' 

? 1 Coincidence^ V Clones/10000 1 



CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USE * SmartGuy : FoxBASEt /Mac : fox files: clones. dbf* 



set heading off 



SCREEN 1 TYPE 0 HEADING • 
list fields number, RFEND, 
*SET PRINT OFF 
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COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 



- REPLACE ALL.RPEND .Wrra .l^.r .-. r v . £.v. . 
" MARK! a > IN r. v t r .".,r. *" 
L'/ SW2=0/ 
^DO. WHILE SW2=0 ROLL 
IF MARKlr>a TOT 
FAOC 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
GO MARK1 
DUP o 1 

STORE ENTRY TO TESTA 
SW « 0 

CO WHILE SW=0 TEST 

skip 

STORE EWPRY TO TESTB 

IP TESTA = TESTS 

DELETE 

DOT - DUP+1 

LOOP - 

ENDIF 
GO MARX1 

REPLACE SPEND WITH DUP 
MARK1 a MARX1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
♦BROWSE 

♦SET PRIOTER ON 

SORT ON RFEND/D, NUMBER TO TEMP2 
USB TEMP2 

REPLACE ALL START WITH KFEND/I35GENE* 10000 

?? STR (UNIQUE ,5,0) 

?? • genes, for a. tdtal of 1 

?? STR(TOT,5,0) 

7? ' clones' 

? • Coincidence V v Clomes/10000 • 



ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USB ■ Smar tGuy : FoxBASE+ /Mac i fox f i less clones. dbf* 




get heading off 
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USE TEMPI ; 

ifctSl Of 1 
??/5TR(TOr;4,0i,. 
77. •* clones' 

? * • '■" ■■- 

♦list off fields, nuntaera,D,F,Z,K,C,Etm^ 

list of V fields nunfl^L,^ " v - ^ : - 

close databases' : ■ j . 
ehasb. tempi. dbf 
use TSMPOESIG; 
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Uifescan i&enb?' version $-7-94 ,v ^' 

SET TALK QFP " ' " 

set device, to screen 

CLEAR / 7 

^SS : ^'fifnar bCSuy ; FoxBASB+/Mac s fox files: clones, dbf • 
STOTE LUPDATE ( ) TO Update 
GQSOTTCM • • ^ 

'STORE PECNOO TO cloneno - 
STORE *6 TO Chooser 

^PzogxenL.*' Lifeseq rrenu.fmt 
:* Date. . . . t 1/11/95 

* Vision. : FosfflASBf/Mac/ revision 1.10 

* ?9P. eB -'* v 3 ?«amat file Lifesag menu 



a oiJS'f 1Z*A& ^ otal clones: "'STOLE 65536 FONT •' 'Geneva" ,12 GODQR o o i V i , 

• SlXBtS 4»,296 SUf •yl.30" *5536 FCOT ^ev^ 

• EOF: Lifeseg menu- fine " " ' 1 
HERD .• . . ti 

DO CASE 

CASE Chooser=l 

Ss2 E SS^° XHASE+/Ma£: ! fil(?3!0ut * uC Programs .Master analysis 3 .prg" 

S3S^ose?=3 ° X3ASE+/MaC: f ^ file8:0utput Progr«B. Subtraction 2.prg" 

^^s^2r XBASfe * /Mae{f °' t fileB,0utput ««*™'*«thern (eiaglej.prg- 
USE •Libraries. dbf 



CAiSE Chooser*5 

^^S^6° X ^ /fteC:fOX file9l0ut * ut P^^ams'See individual clone. prg- 

ra^^^?°f^ +/M&C:£OX ^'^^s'Output programs :Menu.prg- 
CLEAR 

SCRE&l 1 OFF 

RETURN 

ENDCASE 

LOOP 
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91,30 SAY "Database Subset Analysis" STYLE 

V ~ ~' 

9.9r;T3MS{) 

? 'Clone numbers 1 
STR (INITIATE, 6,0) 

1 thrausfc V. ;v.^ v-.v-;- w * j^n- .^ir.-,,- 
(9? STO (TERMINATE, 6,0) 

'Libraries: ' 
IPAENTIRE=1 - 
? fl !All llbrafies' 
ENDIF 

IF. ENTTRE=2 

. MASKal ... 

;;'do while '.t/ ** " " 
IP mark>sto?it 

• EXIT 

.. USE SELECTED 
V GO. MARK . 

TRIMdibnazna) 
^-.rSTORE MARK+1 to mark 
LOOP 

:ENDDO , . ■ 
ENDIF 

,?.- 'Desisnatione* ' •< 

IF &natch=0 .AND. Hmatcn=0 .AND. OnatchsO 
?? 'All* 

ENDIF 

IF Baatch*! 

.?? 'Exact, 1 ... 

sjdif 

:iF-Knatch=l 
?? 'Hunan, • 
ENDIF 

IF Cmatch=l 

?? 'Other sp.: 1 : , 

ENDIF ■ 

IF CONDEN-1 

? 'Condensed format analysis' 

ENDIF 

IF ANAL-1 

?* 'Sorted by NUMBER' 
ENDIF ; 
IF ANAL&2 

? 'sorted by EJTRY 1 : 

ENDIF 

ZF ANAL»3 

? 'Arranged fcy ABUNDANCE 1 

ENDIF 

IF ANAX*s4 

? 'Sorted by INTEREST' 

ENDIF 

IF AKAL=5 

? 'Arranged by LOCATION 1 

ENDIF 

IF ANAL-5 

? 'Arranged by DISTRIBUTION '.- . 

ENDIF 

IF ANAL»7 

? 'Arranged. by FUNCTION' 



\ t FONT "Geneva - , 274 COLOR 0,0,0,-1,-1,-1 



79 



WO%/2068i 



PCT/US95/01160 : 



? ''^^l-cibhes^riBpreaencedr 1 
? 1 3btaT clones \analyzecli " v ■ : ' * ° 



? 
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USE TEMPI 
COUNT TO TOT 
?? ' Total of 

?? sto(toiv4,o> 

>n w:i clones ,r > v -:."-: ! i V 

-*l±8t , 'df f fields 'tiuBiber, L, D, F, 2, R, C , E5TRY , DESCRIPTOR , LENGTH, RFa©, INIT, I 
■IisJt"o« fields., nimtoer, L, D, F, 2 , R, C, EKTRY, DESCRIPTOR 
CLOSE DATABASES " 
ERASE ; TEMPI. DBF , ' .. 
OSS .IB4PDESIG • ' : , • 

.: "■ . •'•!'; i ! - . ; 
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USB TEMPI i ^-^-"l'K; :^Ol 

COUNT -t 

77 • totaled w< r^c; . ^cr- rr r^v u ,, 

? ; i ilA^V * 

USE TEMPOSSIG . . 
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♦Northern (single) , version 11-25-94 
close databases 

SET TALK OFF J ' ; ^''t) OK - ,< 0,.^. ^ a > . i v*^ .' = <r : ny . 

SET PRINT OF? 

SET EXACT 057 * C V >■ => i ; : rv- ^j/'i., •> ; . 

CLEAR 

STORE 1 ^ TO Ebbjeot ; ! '* 1 ' ~ i"^.''^ ouu;, i :> ji« 

glOR B ' f" ; .ij f ,„ • TO Dobject 

STORE 0 TO Numb 

STORE 0 TO Zog ' .,; t >ro*. u = * .. -^iM'i.: !.;i^r .,r ,■,,,>"*.-*. : . ; , ■ 
STORE 1 TO Bail " 
DO WHILE ..T£| * ■ - 

* Program.: Northern (single) .fmt 

*Date.... : 8/ 8/94 n , f . ^v,.,; ^---i. oi UMW.m <. t - lt: . (U . ; 

* Version,: FoxBASE*/Mao, revision 1.10 

* Notes. Format file' Northerai' (single) lv - ; - * r, H; ^ - iv. ;.;.m. ■• . : , ( ... ---i- 

SCREEN 1 TYPE 0 HEADING "Screen '1" AT '40,2 SIZE 286.492 PIXELS PONT 1 "Geneva* . 12 COLOR 0,0,0 

9 PIXELS. 15,81 TO. 46, 397 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 

9 PIXELS 89,79 TO 192,422 STYLE 28447 COLOR 0/0,0,-25600,-1,-1 

$ pixels 115,98 SAY "Entry -#j B STYLE 65536 FOOT "Geneva 1 , 12 COLOR 0,0,0,-1, -1,-1 

0 PIXELS 115,173 GET Bobject STYLE 0 FONT "GenevaM2 SIZE 15,142 COLOR OiO, 0, -1,-1, -1 " : 

0 PIXELS 145,89 SAY "Description* STYLE 65536 FOOT ?GenevaM2 COLOR 0, 0,, 0,-1,-1, -1- 

9 PIXELS 145,173 GET Dobject STYLE 0 FONT 'Geneva - , 12 SIZE 15,241 COLOR 0,0,0,-1,-1,-1 

9 PIXELS 35/89 SAY "Single Northern search- screen" STYLE 65536[FQOT "Geneva" J 274 .COLOR 0,0,- 

9 PIXELS 220,162 GET Bail STYLE 65536 FONT "Chicago", 12 PICTURE "3*R Continue; Bail out" SIZE 

9 PIXELS175,98 SAY "dome #:" STYLE-65536 FONT : "Geneva * ; 12 .? COLOR 0 , 0 , 0 , <- 1 , -1 , 

0 PIXELS 175,173 GET Numb STYLE 0 FOOT "Geneva", 12 SIZE 15,70 COLOR 0,0,0,-1,-1,-1 " 

• 9 PIXELS 80 A52 SAY "Enter any ONE of the following:" STYLE 65536 FOOT J "Geneva "v 12 COLOR -1, 

* EOF: Northern (single) .fmt ' i "' ,v , • . . 
READ 

IF Bail*2 ' ' r ■*■■-•. • } - • . 

CLEAR - .. . 
screen 1 off 

RETURN • , . n ■ , 

USE p SrartGuy:FcxBASE*/Mac:Fox files : lookup, dbf u , 

SET TALK 'ON • ' u ' * L ' ' 

IF Eobjecto' . » 

STORE UPPER (Eobject) to Eobject '' ' . 

SET SAFETY OFF 

SORT O N Entry TO "Lookup entry.dbf" [ 1U 

SET SAFETY ON 

USE "Lookup entry, dbf* 

LOCATE FOR. Look-Eobject 

IF .N0T.P0UNDO 

CLEAR 

LOOP 

ENDIF . 
BROWSE 

STORE Entry TO Searchvjal 
CLOSE DATABASES 
ERASE "Lookup* entry .dbf" 
ENDIF 

IF Dbbjecto* ' 
SET E XACT OFF 
SET SAFETY OFF 

SORT ON descriptor TO "Lookup descriptor. dbf* 

SET SAFETY On 

USB -lookup descriptor. dbf • 

LOCATE FOR UPPER (TRIM (descriptor) ) &UPPER (TRIM { Dobj ect ) ) 

IF .NOT. FOUND () 

CLEAR 
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'J * " I" 



: !-:.-t o;-; !i...t U? J 



"ill * 3; »,i:< 



BROWSE ,j.f- . -/r -:^--fJ,, f x- i^i};-:,: ot * t .-H , r 

STORE Entry .TO, Searchval . 

CLOSE DATABASES " , '* ui^v .i.^ . 

ERASE "LooJcqp descriptor. dbf « 
SET EXACT ON 
ENDIF 

IP NtaiboO-:^. ,-.. Y 

U SE i SroartGiy:FoXBASE+/M^^ ^ vx ^.hl; to:/ 

GO Nunib . , - * - - > . - 

BROWSE " * * ' t-'- •" W - ,J1 :)f .'-'Ni-*/ 'cr-mr^r » 

STORE Entry TO 1 Searchval- •* - ! * : ^ : . 
ENDIF 

CLEAR 

? 'Northern analysis ^for 'entry ,; ' 
?? Searchval t ... 

? 'Enter Y, to proceed* 

WAIT TO Ok ' * " : -'='■*■ ? ■ 

CLEAR 

IF UPPER (OK) o'Y* 7 ''• Ml *' ! - n*. 

s cree n 1 off 

RETURN ' ""■ •'■ T "'iV 

ENDIF ... ^ 

■ ■ : - ' r. ■. r -. - 

* COMPRESSION SUBROUTINE FOR Library, dbf « 
? 'Conipressing the Libraries file now..*' 

' SnartGyy ; PoxBASE* /Mac i Pox files: libraries. dbf 
SET SAFETY OFF 

ON library TO 'Compressed libraries, dbf " 

* FOR entered>0 

SET SAFETY ON : * * ' 

USE B Conpressed libraries .dbf * 
DELETE FOR entered- 0 ' 
PACK 

COUNT TO TOT* ' ' 

MARK1 a 1 
SW2»0 

DO WHILE , SW2=0 ROLL . 

IF MARK1 >s TOT 

PACK v; • 

SW2sl 

LOOP .... - 

ENDIP 
GO MARK1 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTB 
IF TESTA = TESTE 
DELETE 
QJDIF 

MARK1 - MARK1+1 
LOOP 

ENDDO ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern now. . . * 
SET TALK ON 

USB ■SmartGuy:FoxBASE*/Mae:Fox f ilea: clones. dbf • 
SET SAFETY OFF 

COPY TO "Hits. dbf • FOR entryasearchval 
SET SAFETY ON 
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-<•'''" •* • '"M.-' i^" - 3 » ;.t \,. Tilt , ■' l» V V* J it; i i-j.^r, , 

'ici'.--- ..'iuV'' vtvo - . ':r : - < ■ U^iqO- 

CLOSE DATABASES' 2..-. L i i;^ .-• ^ ^r"-; -'V^.i-j . .. r ,. . } \ . :.,_ 0/ . 
SELECT X 

USB ■Canpressetf libraries .ahf* ' s • 
STORE KSCCOUNTO TO Entries : 

SEUDCT 2 .t*A-_v.i';^ < ... » 

USE "Hit»;dbf V- !rr i,o --.—i,. , 

SELECT 1 4 

IF MarJoEntries - ; ; ^ c: :m.< * J «■ rtn-; ( . ;; - ft i , -h* . : c (f . 

EXIT 

GO MARK 

STOKE library "TO Jigger ; ' r " : '"' v ' r v.: • w ^- -«,... 
SELECT 2 

COUNT TO Zog FOH-library^Jigger ■ - : * : ' ;f ^'* :'■">:■ i w .-.^.'.\, ri v;r i 

SELECT 1- } , t 

REPLACE hits witri Zog ^" ,J ' ' x <P' : 

Mark=Mark+l - 

LOOP 

EMDDO* 

SELECT 1 u r - ^ ■ ». j . : :-.» r 

BROWSE FIELDS LIBRARY ,LIBNAME, ENTERED, HITS AT 0,0 ~ 
CLEAR ' * :;f ; 1V 

? 'Enter Y to print: • . . 

WAIT TO FRINSET * : ! - ^ \_; oi \i.ot.:. „ c . 5 

IP UPPER (FRINSET) * 'Y' . . , . v . _ 

SET PRINT ON ' t..-«- , ; , ,.';« : ^ .\j^^r-^ 

CLEAR -,:r ■ ■ .- , . 

EJECT- . •-: ^-^tJtj.-f : c --. .-.v. u*; f 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40 #2 SIZE 286,492 PIXELS FOOT "Geneva M4 COLOR 0,0,0 

7 'DATABASE ENTRIES MATCHINO EMERY 1 

?? Searcbrol ■ - '■■ ' ' ^ • it -t. - • 1t . 

? DATE () ..... .... 

SCREEN 1 ; TYPE 0 HEADING -Screen 1" AT 40 f *2 SIZE 286,492 PIXELS FONT "Geneva" ,7 COLOR 0,0,0, 
LIST OFF FIELDS library # libnaffne , entered, hits x , 

? >^ - 

SELECT 2 

LIST OFF FIELDS NUMBER, LIBRARY , D , S « ?# Z , R, ENTRY, DESCRIPTOR , RFSTART, START, RFEND 
SET TALK OFF - t <- i ,- • % - t 

SOT PRINT OFF ' ' 

EMDIF .. i : • ; . : ; . . _ 

CLOSE DATABASES «- 

SET TALK OCT 

CLEAR 

D O 'Te st print .prg" 
RETORN 
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. TABLE 6 . . 



library 
ADSNINB01 
ADRBJOR01 
ADHENOTO1 
AMLBNOTD1 
BMARNOT01 
BMARNOT02 
CARDNOT01 
CHAONOT01 
CORNNOTDI 
P9RAGT01 
FIBRAGTQ2 
FlSnANTOI 
FI2HNGTD1 
FlBRNO'Ha 
FIBRNOTOl 
RS3NOT02 
HMC1NOT01 
HUVELPB01 
HUVENO601 
HUVESTB01 
HYPONO801 
KIONNOT01 
UVRNOTDI 
UINGNOTD1 
MU5CNOT01 
OVIONOfiOl 
PANCNOT01 
prruNORoi 

PntNOTOI 
PLACNOB01 
&KTN0TD2 
SPLNFET01 
SPLNNOT02 
STOMNOTOI 
6YNORAE01 
TBLWOTD1 
TESTNOTOI 
THP1NOB01 
THP1PEB01 
THPlPLBOt 
U937NOT01 



libname 
Inflamed adenoid . 
Adrenal gland (0 
Adrenal gtend (f) r : \ 
AML blast cells (T) 
Bone marrow • ; vi: . 
Bone marrow (T) 
Cardac muscle. 0") ; 
Chtn. hamster ovary 
Corneal stroma t - 
FibrobJest, ATS 
Fibroblast, AT 30 
Fibroblast AT 
Fibroblast uv 5 
Fibroblast, uv30 
Fibroblast : . 
Fibroblast normal ' 
Mast coD line HMC*1 
HUVEC!FN,TNF,lPS 

HUVEC eonrrol 
HUVEC shear stress 
Hypothalamus 
Kidney (T) 
Liver (T) 
LimgfJ) 

Skeletal muscle (7) 
OvWuet ' ' } 
Pancreas, normal 
Pituitary (r)-- 
Pituitary (T) 
Placenta 

Small intestine (T) 
Spleervffiver, fetol . 
Spleen (7) 
Stomach • 
Rheum, synovium 
T + B rymphoblast 
Testis (J) 
THP-1 control , 
THPphorbol 
THM phorbol LPS 
U937, monocytic teuk 



. . . . - .J . 

; t;n. • X ; r;> 



numberlibrary 

2304 UB37NOT01 

3240 HMC1NOT01 

3269 HMC1NOT01 

4693 HMC1NOT01 

8389 HMC1NOT01 

9139 HMC1NOT01 



d s f 2 r entry 
EHCCT HUMEF4B 
EHCCT HUMEF1B 
EHCCT HUMEF1B 
EHCCT HUMERB 
EHCCT HUMEF1B 
EHCCT HUMEF1B 



descriptor 
Elongation factor 1-beta 
Elongation (actor 1-beta 
Elongation factor 1-beta 
Elongation factor i-beta 
Elongation (actor 1-beta 
Elongation factor 1-beta 



Mataneian 


rfend 


o- 0 


773 


0 370 


773 


0 371 


773 


0 470 


773 


0 327 


773 


0 375 


773 
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WHAT IS CLAIMED 1 ^Sr ^r V; ^ J St V<1 U'.M* :v" 

1. A method of ^analyzing specimen containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 » , generating a set of transcript sequences, where 

e£cfi.Vef;*the transcript sequences in said set, is indicative 
of -a different one of the biological sequences of * the 
library; f' [ C^" f 

(c) processing the transcript sequences' in a T 
10 programmed computer in which a database of Tref erence 

t .--••if • i v ; J. T £ 

transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of ^.sequence [■ 
15 annotation and £ degree of match between one of - the ; ! 
transcript sequences and at least one of the ref erence * « 
transcript sequences; -and V 

^(d) processing each said identified sequence value to 
generate .final data values indicative of a number of times 
each identified sequence value is present in the library. 



2. The method ^of claim' l, wherein rstep (a) includes 
the steps of: ,r * ' « f 

l; Tobtaining a dmixture of mRNA; !'■ r t 

' i u.makingf cDNA copies of the' mRNA; ? 
25 ■• '.^ isolating *a; representative .population of * - clones 

tr^nsfected . with : ; the cDNA and producing [therefrom \he ; 
library of biological sequences. ? 

3. .The method of claim 1, wherein the biological • 
sequences 1 are cDNA sequences. 



4. The method of claim 1, wherein the. biological 
sequences are RNA sequences. 

5. The method of claim 1, wherein the biological 
sequences are protein sequences. 
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6. The method of claim 1, wherein a first value of 
said degree of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 



5 7. A method of ^comparing two specimens containing 

gene transcripts, said method comprising: ' 

(a) analyzing a first specimen according to the 
method of claim 1; 

'"' . ......... 

(b) producing a second, library of biological 
10 sequences; \ 

(c) generating a second 'set x qf transcript sequences, 
where each of the transcript sequences in said second set 
is indicative of a different one of the biological 
sequences of the second library; i •< - •>. .. 

15 ' < d > processing the second/jset of transcript sequences 

in said programmed computer;- to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 

20 a degree of match between one' of the biological sequences 
of the second library and at least one of the reference 
sequences; 

(e) .processing each said "further •identif ied 'sequence 
value to generate further final data values indicative of a 

25 number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens. 



8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
35 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 
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(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of 

10 ( a > isolating a population of mRNA transcripts from 

biological specimen; 

(b) identifying genes from Which the mRNA was 
i transcribed by a sequence-specific method; 
: i . .( c) . 'determining numbers^yf 'mRNA transcripts > ' . 

fl5 corresponding to each ( of the genes ; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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; '* a( ; P.y ? fL inserting JtKe -cDNA; idto.rai suitable- vector and 
using said vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones , 
^each;cl£ne representing Ja^unique ImRNAr^ < 7 C 
5 : - ■ r ^ay^ isolating -M 'r^resentatiiveip^ of 
recombinant clones; ' 

(e) identifying amplif ied^DtiAs -vfrom^£ach clone in 
the population :by a sequence-specific ^me£hqd wlrich 
identifies gene f rom which the^urtique^^^ transcribed; 
10 (f) determining a. .number . of -times -each gene is 

represented within the^population of clones as an 
indication of rei%iS^i^bundance ; £ah& ^"r ^ 

(g) listing the genes and their relative abundance in 

°f der P f .abundance, thereby producing _ the ^gene. .transcript 

15 v ; imaige ; ; ^ ^ \ 

■ r 13- ^he method j^c^im:^ step 
of diagnosing disease by: " f ; — h:;^ ^ ' 

repeating st^ps (a) through -(g) on biological 
specimens from random ; sample" of; normal^ana^diseased; humans, 
encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by ^performing steps (a) * 
- through (g) on said test specimen; "".'../\., r \ : ; 
25; comparing the test gene transcript image with the 

reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

30 14. A computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library; 

35 and 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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!, reference biological sequences is 

: . s^ored> "wherein the computer is programmed with software 

*f orientating ^ for. eacbL, of ~.the 

f - — - transcript sequences, where each said identif ied sequence— 

: 5 value is indicative of a sequence annotation and a degree 
j of match between a different one qf_the r biological _ 

' " r1 Sequences ^oF the "library aficf ' at least ' one' br the "reference 
transcript sequences, and for processing each said 
| r# i.v.r-y.. v uddentified , sequence, value, to ^generate,: f inal data values: 
'XD ' indicative of n;a number of times each identif ied sequence 
value is present in the library. 

r r ■ ; ; _ , ,2,5..- :i The, ;system of claim 14 1/t ^also : including:; ^ . 
'! liSrary generation' means for producing the library of 

biological sequences, and, generating said set of transcript 
15 .* sequences from said library; « ; : - ; 

16.' The system of claim 15, wherein the library 
; generation f v means includes:^ "m" 
' ^means' for bbtaining' *a ! mixture of itlRNA; ' ; 
;~\ l ineans~for making cDNA copies of thie inRNA;. 
20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 
clones and producing therefrom the library of biological 
, sequences . 
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